issues
search
microsoft
/
mup
maximal update parametrization (µP)
https://arxiv.org/abs/2203.03466
MIT License
1.24k
stars
88
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
missing os import in mup/examples/MLP/main.py ?
#19
james-simon
closed
1 year ago
1
mu parametrization for channel attention
#18
xwjabc
closed
2 years ago
5
mu parametrization for multi-head attention / grouped convolution
#17
xwjabc
closed
2 years ago
3
Optimizers for coord check
#16
xwjabc
closed
2 years ago
2
Torchdistx
#15
edwardjhu
closed
1 year ago
2
Coord-check for conv1d
#14
bob80333
closed
2 years ago
17
ResNet readout_zero_init=True?
#13
D-X-Y
closed
2 years ago
2
Hyperparameter search on base models
#12
davisyoshida
closed
2 years ago
2
integration with Flax?
#11
nestordemeure
opened
2 years ago
4
Examples with ConvNets
#10
Aboussejra
closed
2 years ago
2
Does MuReadout apply to all outputs on which loss is computed?
#9
jaivardhankapoor
closed
2 years ago
2
How to use 'attn_mult' config
#8
JiayiFeng
closed
2 years ago
2
MuAdam not adjusting lr for output weights
#7
zhuzilin
closed
2 years ago
4
Is this compatible with DeepSpeed / ZeRO?
#6
StellaAthena
closed
1 year ago
6
Multiple nn.Linear layers
#4
windspirit95
closed
2 years ago
4
Does mup work with model with Conv2D as output?
#3
BurguerJohn
closed
1 year ago
8
PyTorch Lightning example
#2
tchaton
opened
2 years ago
1
Consider decoupled weight decay optimizers?
#1
abhi-mosaic
closed
2 years ago
4
Previous