in mlp example: 2 problems

microsoft / mup

maximal update parametrization (µP)

MIT License

1.24k stars 88 forks source link

1) https://github.com/microsoft/mup/blob/main/examples/MLP/main.py#L61 If you don't specify a base shape file, then you are using standard parametrization,in the code,the optimizer will use the MuSGD?https://github.com/microsoft/mup/blob/main/examples/MLP/main.py#L257

2) why the init func not use the mup.init? https://github.com/microsoft/mup/blob/main/examples/MLP/main.py#L139

### Replace your custom init, if any
for param in model.parameters():
    ### If initializing manually with fixed std or bounds,
    ### then replace with same function from mup.init
    # torch.nn.init.uniform_(param, -0.1, 0.1)
    mup.init.uniform_(param, -0.1, 0.1)

microsoft / mup

in mlp example: 2 problems #41