Can't train to convergence

orobix / fwdgrad

Implementation of "Gradients without backpropagation" paper (https://arxiv.org/abs/2202.08587) using functorch

MIT License

95 stars 7 forks source link

Can't train to convergence #10

Closed wuyaozong99 closed 1 year ago

wuyaozong99 commented 1 year ago

Hi, thanks for your implementation!

When i run the mnist_fwdgrad.py and choose the Conv model, the train loss doesn‘t decrease as in the paper, but decreases to 1.7 and then suddenly rises to more than 14. The training is divergent, unable to train to convergence. Have you ever encountered this problem?

belerico commented 1 year ago

Hi @wuyaozong99, sorry for the late response. Have you run the training with the default hyperparams? Have you tried also with the MLP network?

wuyaozong99 commented 1 year ago

@belerico Thanks for your reply. I found that I used a fixed learning rate 2e-4 instead of the default setting, which decreased with iteration. After using the default settings, the divergence phenomenon is alleviated.