Closed tautomer closed 1 year ago
Wait one second, I just realized, if device is CPU, there is no need to reload the parameters. Similar story to model transfer in set_devices. Let add another commit.
Per our discussion, it makes more sense to make this PR part of #14
If the optimizer is initialized first and then the model is transferred to GPU, Adagrad will crash because some of its tensors are on the CPU. This might affect some other optimizers as well.
See the official documentation for explanations.
To fix this bug, simply reload the state dictionary again after model transfer.