Closed khornlund closed 5 years ago
Hey, this looks great! If you'd like to upstream this into my repo, that sounds fine by me. But if you would prefer to keep your fork separate that is also totally fine, and I can link to it from the README here once you're done.
Great, I'll submit a PR when it's ready.
I'd like to expand the README a bit with some examples. Could you offer any advice about using the adaptive version? For example, did you find a particular optimizer/learning rate/scheduler worked well?
Sounds good.
Usually the adaptive loss trains much more slowly than any model weights, so I always just use whatever learning rate+schedule I was using for the model weights. As for optimizers, I always use Adam, but my rationale for that is independent of any of this loss functions business --- I just like Adam.
Hi Jon,
Nice paper and thanks for providing a PyTorch implementation!
Are you interested in making this installable as a package? I've started doing so in my fork here. It would be good to know if I should work towards a PR, or just develop it independently.
Cheers, Karl