konstmish / prodigy

The Prodigy optimizer and its variants for training neural networks.
MIT License
298 stars 17 forks source link

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #1

Open manyotherfunctions opened 1 year ago

manyotherfunctions commented 1 year ago

Got this error when resuming from a checkpoint.

scaler.step(optim_d) File "/home/methos/miniconda3/envs/rvc/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 315, in step return optimizer.step(*args, **kwargs) File "/home/methos/miniconda3/envs/rvc/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 69, in wrapper return wrapped(*args, **kwargs) File "/home/methos/miniconda3/envs/rvc/lib/python3.10/site-packages/torch/optim/optimizer.py", line 280, in wrapper out = func(*args, **kwargs) File "/home/methos/miniconda3/envs/rvc/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/methos/miniconda3/envs/rvc/lib/python3.10/site-packages/pytorch_optimizer/optimizer/prodigy.py", line 136, in step d_numerator.add_(torch.dot(grad.flatten(), (p0 - p).flatten()), alpha=(d / d0) * d_lr) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

konstmish commented 1 year ago

Thanks for reporting this problem, seems like I need to modify the optimizer to load the state properly, I will investigate this.