Now I'm reproducing your work.
I have two question about clipping value and soft round function.
When I set learning rate to 10e-4, graident values are diverged. So I use gradient clipping and set the value to 0.01.
I wonder if it is reasonable value.
Hi, first really thank you for your work!
Now I'm reproducing your work. I have two question about clipping value and soft round function.
When I set learning rate to 10e-4, graident values are diverged. So I use gradient clipping and set the value to 0.01. I wonder if it is reasonable value.
What value did you set to tuning parameter alpha?
thank you in advance!