Closed txy159 closed 10 months ago
Please read the documentation on how to train on gpus. We have specifically added a section in the documentation. There is a proper pytorch way to do this.
Thanks, I've already gone through the documentation before reaching out about this problem.
I found that change the "generator" parameter in Dataloader function solved this issue.
data_loader = data.DataLoader( ..., generator=torch.Generator(device='cuda'), )
Email (Optional)
No response
Version
v0.8.5 and v0.7.1
Which OS(es) are you using?
What happened?
Dear developers,
I'm trying to train a M3GNet potential using the same code in the tutorial (https://matgl.ai/tutorials%2FTraining%20a%20M3GNet%20Potential%20with%20PyTorch%20Lightning.html).
Training the potential on a CPU went smoothly without any issues. However, when I switched to a GPU node for training, I ran into several errors.
I made the following adjustments to the code to enable training on a GPU node.
trainer = pl.Trainer(max_epochs=1, accelerator="gpu", devices=[0], logger=logger, inference_mode=False) trainer.fit(model=lit_module_finetune, train_dataloaders=train_loader, val_dataloaders=val_loader)
Then the following error occurs,
I also tried to set the default device to one specific gpu, but I encountered another error:
Do you have any suggestions on fixing these errors ? Thanks in advance.
Code snippet
No response
Log output
No response
Code of Conduct