issues
search
sail-sg
/
Adan
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
Apache License 2.0
744
stars
63
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
如何设置Adan学习率
#48
theFoxofSky
closed
2 months ago
3
Install Error
#47
xv994
closed
5 months ago
4
RuntimeError: The detected CUDA version (12.2) mismatches the version that was used to compile PyTorch (11.8).
#46
trungpx
closed
5 months ago
2
Fix CUDAExtension bug in setup.py
#45
AlexwellChen
closed
6 months ago
0
About the pre-trained model
#44
casiatao
closed
2 months ago
1
Settings for instruction-tuning
#43
KaiLv69
opened
8 months ago
2
在我的cnn模型中,lr=0.01时,在20-30epoch,map可以提升的很快但是后续会成为NAN。但是如果使用0.001不会直接为NAN,但是效果不好,请问这个现象代表着什么问题?谢谢!
#42
liiicon
closed
2 months ago
4
How to implement Adan optimizer in Yolov7?
#41
karan16mehta
closed
9 months ago
1
Concrete weight decay configuration for GPT-2 pretraining
#40
DesperateExplorer
closed
2 months ago
1
Adan相比于SGD在前 74 epochs保持领先,但是后续收敛变慢,我改如何调整lr等超参数?
#39
liiicon
closed
9 months ago
2
Handle empty parameter list
#38
janEbert
closed
1 year ago
0
Restarting strategy
#37
janEbert
closed
1 year ago
4
Deepspeed Integration
#36
pUmpKin-Co
closed
1 year ago
4
Gradient clipping option in DeepSpeed
#35
DesperateExplorer
closed
1 year ago
1
module 'fused_adan' has no attribute 'adan_multi_tensor'
#34
76586
closed
1 year ago
1
processing data for BERT experiment
#33
kenoharada
closed
2 months ago
4
GPU type and GPU nums and total training time on Transformer-XL, GPT-2
#32
kenoharada
closed
1 year ago
2
Update unfused install
#31
AlexwellChen
closed
1 year ago
0
Allow unconditional CUDA build
#30
janEbert
closed
1 year ago
0
Some questions about learning rate.
#29
stella-von
closed
1 year ago
7
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
#28
MichaelMonashev
closed
1 year ago
12
[Feature] Fused kernel for Adan optimizer
#27
AlexwellChen
closed
1 year ago
0
HumanEval shall not be used for training.
#23
aviczhl2
closed
1 year ago
1
Add `setup.py`
#22
janEbert
closed
1 year ago
3
Suggestions for applying to visual dense prediction tasks.
#21
pUmpKin-Co
closed
1 year ago
6
Training yolov5 model appears nan
#20
xialuxi
closed
1 year ago
6
valueError: not enough values to unpack (expected 3, got 2)
#19
myseverus
closed
1 year ago
2
Whether it is applied to the training of GAN network?
#18
SHNsunhenan
opened
1 year ago
2
Some questions in step function
#17
RookieXwc
closed
1 year ago
3
About the convergence trend comparison with Adamw in ViT-H
#16
haihai-00
opened
1 year ago
3
Typo in the paper
#15
Tomarchelone
closed
1 year ago
1
why there is no sgd-style implementation?
#14
brisker
closed
1 year ago
8
Embedding tensors/weight update unsupported
#13
DenisVorotyntsev
closed
1 year ago
5
`no_prox` Flag
#12
Zach-ER
closed
1 year ago
5
Add closure function in step for compatibility
#11
CookieLau
closed
1 year ago
1
add multi_tensor
#10
bonlime
closed
1 year ago
3
The BERT finetuning get_data file error?
#9
NoahDDavis
closed
1 year ago
4
\epsilon not implemented as in the paper
#8
Zach-ER
closed
1 year ago
1
block: [0,0,0], thread: [96,0,0] Assertion `input_val >= zero && input_val <= one` failed.
#7
lucasjinreal
closed
2 years ago
3
Is there a TensorFlow/Keras implementation?
#6
cmsflash
closed
1 year ago
7
remove redundant update calculation, and unused import
#5
lessw2020
closed
2 years ago
2
Beta values are not same
#4
JaheimLee
closed
2 years ago
1
Step 2 of Usage
#3
richinex
closed
2 years ago
1
fix minor typo
#2
xk-huang
closed
2 years ago
0
`torch._foreach...` implementation
#1
bonlime
closed
1 year ago
2