sail-sg / Adan

Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
Apache License 2.0
746 stars 63 forks source link

Handle empty parameter list #38

Closed janEbert closed 1 year ago

janEbert commented 1 year ago

If params_with_grad remains empty, the fused CUDA kernel will crash without error due to trying to index into an empty list. This PR first fixes the CUDA kernel so it throws a more meaningful error. In addition, in Python, it skips the whole dispatching to update sub-functions whenever the params_with_grad list is empty. This is also necessary because empty lists aren't handled in torch.__foreach__ functions either.