Hello! First of all, thank you for doing such great work and making it so accessible. I'm looking at using mup for a project but I'm a bit confused about how to set the base shapes for the smaller model used for hyperparameter tuning.
Let's say I want to train an MLP with hidden dimension 1024, and I want to muTransfer the best learning rate from an MLP with hidden dimension 128. My top-level code might look like this:
best_loss = float('inf')
best_lr = 0.
# Hyperparameter sweep with hidden dimension 128
for lr in learning_rates:
small_mlp = MLP(hidden_dim=128)
# use `base=None` in `set_base_shapes`
small_mlp = mup.set_base_shapes(small_mlp, base=None)
final_loss = full_training_loop(small_mlp, lr=lr)
if final_loss < best_loss:
best_loss = final_loss
best_lr = lr
# Transfer optimal LR to large model
base_mlp = MLP(hidden_dim=128)
big_mlp = MLP(hidden_dim=1024)
big_mlp = mup.set_base_shapes(big_mlp, base=base_mlp)
ultimate_loss = full_training_loop(big_mlp, lr=best_lr)
or like this:
best_loss = float('inf')
best_lr = 0.
for lr in learning_rates:
small_mlp = MLP(hidden_dim=128)
# use a base model in `set_base_shapes`
smaller_mlp = MLP(hidden_dim=32)
small_mlp = mup.set_base_shapes(small_mlp, base=smaller_mlp)
final_loss = full_training_loop(small_mlp, lr=lr)
if final_loss < best_loss:
best_loss = final_loss
best_lr = lr
# Transfer optimal LR to large model
base_mlp = MLP(hidden_dim=128)
big_mlp = MLP(hidden_dim=1024)
big_mlp = mup.set_base_shapes(big_mlp, base=base_mlp)
ultimate_loss = full_training_loop(big_mlp, lr=best_lr)
Could you please clarify which of these would be correct? Thank you very much for your time!
Hello! First of all, thank you for doing such great work and making it so accessible. I'm looking at using
mup
for a project but I'm a bit confused about how to set the base shapes for the smaller model used for hyperparameter tuning.Let's say I want to train an MLP with hidden dimension 1024, and I want to muTransfer the best learning rate from an MLP with hidden dimension 128. My top-level code might look like this:
or like this:
Could you please clarify which of these would be correct? Thank you very much for your time!