konstmish / prodigy

The Prodigy optimizer and its variants for training neural networks.
MIT License
298 stars 17 forks source link

Layer-wise scaling #9

Open adefazio opened 9 months ago

adefazio commented 9 months ago

This is a potential implementation of layer-wise scaling. It needs testing before merging.

konstmish commented 9 months ago

Thanks for creating the pull request. What kind of testing do you have in mind?

rockerBOO commented 4 months ago

If there was some results for testing that would be desired, I am trying out this PR. It does seem to work as I expect it would but if we need more info I can test those.

Thank you.