microsoft / mup

maximal update parametrization (µP)
https://arxiv.org/abs/2203.03466
MIT License
1.37k stars 94 forks source link

Does mup work with model with Conv2D as output? #3

Closed BurguerJohn closed 1 year ago

BurguerJohn commented 2 years ago

Hello, this project look great and the Github documentation is really good. Just wondering if mup would work with a model that have the last layer as nn.Conv2d instead of linear.

edwardjhu commented 2 years ago

Hi BurguerJohn,

We haven't implemented a mu-version of Conv2d to use as the output layer, but we can certainly do it! It seems slightly unusual to us to use Conv2d as the output layer. Could you tell us more about your model?

BurguerJohn commented 2 years ago

Its more of a curiosity test, I would like to see how it would perform in a Unet model.

tivek commented 2 years ago

In our case, we use ConvTranspose2d (1d, 3d) as output, but basically that should behave like Conv2d and Linear.

May I ask about the progress of the muconv2d branch?

edwardjhu commented 2 years ago

Hi both,

Thanks for your patience regarding this issue. The muconv2d branch should work in principle, but I haven't added test cases since it requires the labels to be the output of a conv layer. If there is interest, we'd love to invite you to give it a try in your code and see if you could reproduce the coordinate check plots in README. We are happy to help debug if you run into any issues!

tivek commented 2 years ago

Hi @edwardjhu, thanks for the kind reply!

My team certainly plans to make coordinate check plots of our models with ConvTranspose output layers. At this point we are working around lr schedulers which are not compatible with mup's optimizers. When we are ready, I am going to post the results here.

thegregyang commented 1 year ago

Closing this issue for now, but feel free to re-open when there are new updates.

tivek commented 7 months ago

A belated and short update, the muconv2d branch is working fine for us. If desirable, I can whip up a coord_check plot for a toy model with MuOutConvTranspose1d.

edwardjhu commented 7 months ago

sure!