microsoft / mup

maximal update parametrization (µP)
https://arxiv.org/abs/2203.03466
MIT License
1.24k stars 88 forks source link

in mlp example: 2 problems #41

Open yjjinjie opened 1 year ago

yjjinjie commented 1 year ago

1) https://github.com/microsoft/mup/blob/main/examples/MLP/main.py#L61 If you don't specify a base shape file, then you are using standard parametrization,in the code,the optimizer will use the MuSGD?https://github.com/microsoft/mup/blob/main/examples/MLP/main.py#L257

2) why the init func not use the mup.init? https://github.com/microsoft/mup/blob/main/examples/MLP/main.py#L139

### Replace your custom init, if any
for param in model.parameters():
    ### If initializing manually with fixed std or bounds,
    ### then replace with same function from mup.init
    # torch.nn.init.uniform_(param, -0.1, 0.1)
    mup.init.uniform_(param, -0.1, 0.1)
edwardjhu commented 1 year ago

Thanks for the questions!

  1. If you don't specify a base shape, it will default to the shape of the target model, which is equivalent to SP even if you are using a MuOptimizer.
  2. We didn't have the mup library when we first wrote the code for the MLP experiment -- you are right that we can use mup.init there. Line 139 to 141 are doing what mup.init does manually.