vsitzmann / siren

Official implementation of "Implicit Neural Representations with Periodic Activation Functions"
MIT License
1.74k stars 247 forks source link

Possible mismatch between supplementary section 1.5 and the implementation? #4

Open crysoberil opened 4 years ago

crysoberil commented 4 years ago

The initialization of the first layer as in here:

https://github.com/vsitzmann/siren/blob/ecd150f99b40217d76e0f15753b856aa2d966ab1/modules.py#L629-L634

This does not apply a square root over the fan in of the layer. Am I missing something in the paper?

vsitzmann commented 4 years ago

Edit: I'll leave this issue open so other people can see it.

No, we actually got this wrong in the paper - the implementation is correct, this magnitude of weights in the first layer is appropriate for images, the one in the paper is too large! We'll fix it in the next version. In general, the initialization of the first layer is dependent on the frequencies of the signal - higher frequencies require larger weights in the first layer. See, for instance, the audio section in the Colab, where we set omega_0 to 3000!

grondilu commented 4 years ago

In general, the initialization of the first layer is dependent on the frequencies of the signal

The highest frequency in the signal is related to its sampling resolution. Ideally you'd have to invoke Shannon's theorem to determine the appropriate frequency range.

You shouldn't pick omega_0 from an heuristic.

dcato98 commented 4 years ago

From Supplemental section 1.5:

This keeps the distribution of activations constant, but boosts gradients to the weight matrix, W, by the factor, ω0, while leaving gradients w.r.t. the input of the sine neuron unchanged.

Does scaling up the learning rate accomplish the same thing? If not, what's the difference between these two hyperparameters?

Experimentally, the losses follow a similar trajectory when scaling up either omega_0 or scaling up the learning rate by the same factor (in the latter case, I removed omega_0 scaling from everywhere except the first layer's activations). As well, the visual outputs show similar progress. I used the cameraman ImageFitting training procedure from the linked Colab notebook.

vsitzmann commented 4 years ago

The highest frequency in the signal is related to its sampling resolution.

This statement is incorrect. The highest possible frequency that is not aliased in the sampled signal is the Nyquist frequency. It is not the maximum frequency that is present in the underlying, ground-truth signal. You could also have a low frequency sine wave that is sampled at very high resolution, in which case you want the initialization to reflect the intrinsic low frequency, instead of the Nyquist frequency.

If we applied your principle, then superresolution as an application would be impossible. It may very well be that it is attractive to learn a prior over frequencies that are higher than the Nyquist frequency in order to perform, for instance, superresolution.

This is also related to the idea to get rid of discrete grids - we want to match the intrinsic spectrum of the signal, not the the spectrum of the sampled signal.

MARD1NO commented 4 years ago

If I use this sin activate in CNN to some simple task, such as Image Classification. I just follow the initialization in your code not the paper right?I was confused about it......

ZhengdiYu commented 3 years ago

If I use this sin activate in CNN to some simple task, such as Image Classification. I just follow the initialization in your code not the paper right?I was confused about it......

Same question here. Have you tried that?