vsitzmann / siren

Official implementation of "Implicit Neural Representations with Periodic Activation Functions"
MIT License
1.72k stars 247 forks source link

Does SIREN only work well with over parameterised network? #31

Open MengZephyr opened 3 years ago

MengZephyr commented 3 years ago

Hello,

I spent one week on testing siren and try to introduce it to my system. In my knowledge, the Relu network at least is able to come out a result with the averaged or balanced patterns from the training data.

I did a small test like this: to jointly train the neural feature image and a CNN auto-encoder with Sin as the activation. With just 2 cat images as the reconstruction target, to my surprise, an over-parameterized network along with 2 corresponding neural images, very quickly (about 1000 iterations) outputs the results with beautiful high frequency patterns of hairs. Then I begin to reduce the size and dimension of the neural image. Consequently, the loss value still decrease fast at first, then at some iteration, the value suddenly jump high and come a very desperate result. Here I attach the processing result: image

I just think very natively, is it because Sin activation is very sensitive to the gradient step, that any wrong step may lead the result to a bad local minimum? Have you tested the generalisation of Siren? Or does the Siren only work well with the over parameterised network?

VovaTch commented 2 years ago

I've only come across SIREN very recently, I've been experimenting on my own with a similar type of network independently so maybe I can help. Yes this is a problem that I've encountered, where my fitted images go haywire and dissolve into noise. What worked for me is using a specific set of training methods. I've used OneCycleLR (pytorch) in conjunction with AdamW. It seems using weight decay and amsgrad is essential (usually 1e-3 max learning rate and 1e-6 weight decay), else the output indeed seems to dissolve into noise. Setting a max_clip_norm of about 0.1 to 1 also helps.

Keep in mind that mathematically smaller images tend to require higher frequency sinusoid for fitting, and the derivative of a high-frequency sinusoid can grow very large if not multiplied by a small constant to compensate.

ivanstepanovftw commented 6 months ago

Try setting smaller c in initializer, that was initially proposed as 6, to lower value like 4 or 3.2, as it helps to stabilize gradient flow, i.e.

self.c = 4

if self.is_first:
    bound = 1 / fan_in
else:
    bound = math.sqrt(self.c / fan_in) / self.omega_0
x.uniform_(-bound, bound)