Closed lostmsu closed 3 years ago
I also experienced instability during training, until I just used a very small learning rate ( 1e-5 ) from start to finish. Then train for a lot of epochs, because the training is much slower due to the small learning rate. Did you try something like that already?
On Mon, Jun 7, 2021 at 8:47 AM Victor @.***> wrote:
I reimplemented Siren in TensorFlow 2.5. The network easily learns images, but I can not reproduce result with audio. On the sample file from the paper loss gets stuck at relatively high value (~0.0242), and network's output turns very quiet (max(abs(x)) ~= 0.012). Just curious if anyone has faced the same issue when reimplementing Siren on their own.
What I've tried so far:
- doublechecked omega - it is set to 3000.0 (input), 30.0, 30.0, 30.0 (inner) layers
- Changing batch size to full length of the sample (I used to do randomized batches of 8*1024)
- Using float64 to avoid potential issues with numerical overflows/underflows
- Checked network weights: all are finite numbers
- Using SGD as a more stable optimizer
- Increasing network width/adding more layers
Essentially, all the above actions still led to the same result with loss ~0.0242
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vsitzmann/siren/issues/46, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGFHLVVLXMUVKW2SDYJR53TRRTQZANCNFSM46G67GIA .
@schreon I used the learning rate from the paper: 5e-5.
But NVM, I figured why it was not training on audio and it was completely my fault: I set incorrect shuffling mode. In TensorFlow when you do model.fit
by default data is not shuffled so I assume feeding the audio stream sequentially threw the optimizer off the course each time due to forgetting.
It also appears that you need to scale omega
for the input layer for longer audios.
Yes. Did you find a good heuristic for scaling omega with differing input sizes yet? I believe we can scale it linearly per domain. For example, if you squeeze an audio of double size than the one in the paper into -1, 1 you will end up with double frequency, hence doubling omega to omega_input = 6000
would make sense. If this works consistently, we would only have to find one "base omega" for each domain once.
Yes, I noticed that. I wonder now if it makes sense to make omega itself a trainable parameter with log scale.
I reimplemented Siren in TensorFlow 2.5. The network easily learns images, but I can not reproduce result with audio. On the sample file from the paper loss gets stuck at relatively high value (~0.0242), and network's output turns very quiet (
max(abs(x)) ~= 0.012
). Just curious if anyone has faced the same issue when reimplementing Siren on their own.What I've tried so far:
omega
- it is set to 3000.0 (input), 30.0, 30.0, 30.0 (inner) layersfloat64
to avoid potential issues with numerical overflows/underflowsEssentially, all the above actions still led to the same result with loss ~0.0242