PyTorch Lightning example

microsoft / mup

maximal update parametrization (µP)

MIT License

1.37k stars 94 forks source link

Hi tchaton,

Thanks for the pointer to the Lightning Tuner. We are not familiar with its usage, but from the page you linked, it looks like one can pass a model to, for example, lr_find along with a grid and the Tuner performs the necessary for loop(s) and returns the best HPs. In other words, one should be able to pass the proxy model, parametrized in muP, to the Tuner and take advantage of both right away.

Perhaps you are thinking about adding an option such as lr_find(model, mup=True, ...) to the Tuner API. The main obstacle is that we still need to let muP know which dimensions go to infinity in the limit by instantiating models of different widths. We also need the user to manually switch optimizers as well. Both are hard to hide inside a Tuner fn call.

Please let us know if you have ideas on how we can make this integration more seamless!

microsoft / mup

PyTorch Lightning example #2