zyushun / Adam-mini

Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793
257 stars 9 forks source link

Replaced Hardcoded CUDA Calls #1

Closed hunterqueb closed 1 month ago

hunterqueb commented 1 month ago

Ran into issues running Adam-mini on a non-cuda device. I noticed hardcoded calls that sent tensors to cuda and replaced it with to(device) calls.

zyushun commented 1 month ago

Thanks for the refinement!