microsoft mup issues - Githubissues

microsoft / mup

maximal update parametrization (µP)

https://arxiv.org/abs/2203.03466

MIT License

1.24k stars 88 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

missing os import in mup/examples/MLP/main.py ?

#19 james-simon closed 1 year ago
1
mu parametrization for channel attention

#18 xwjabc closed 2 years ago
5
mu parametrization for multi-head attention / grouped convolution

#17 xwjabc closed 2 years ago
3
Optimizers for coord check

#16 xwjabc closed 2 years ago
2
Torchdistx

#15 edwardjhu closed 1 year ago
2
Coord-check for conv1d

#14 bob80333 closed 2 years ago
17
ResNet readout_zero_init=True?

#13 D-X-Y closed 2 years ago
2
Hyperparameter search on base models

#12 davisyoshida closed 2 years ago
2
integration with Flax?

#11 nestordemeure opened 2 years ago
4
Examples with ConvNets

#10 Aboussejra closed 2 years ago
2
Does MuReadout apply to all outputs on which loss is computed?

#9 jaivardhankapoor closed 2 years ago
2
How to use 'attn_mult' config

#8 JiayiFeng closed 2 years ago
2
MuAdam not adjusting lr for output weights

#7 zhuzilin closed 2 years ago
4
Is this compatible with DeepSpeed / ZeRO?

#6 StellaAthena closed 1 year ago
6
Multiple nn.Linear layers

#4 windspirit95 closed 2 years ago
4
Does mup work with model with Conv2D as output?

#3 BurguerJohn closed 1 year ago
8
PyTorch Lightning example

#2 tchaton opened 2 years ago
1
Consider decoupled weight decay optimizers?

#1 abhi-mosaic closed 2 years ago
4

Previous