materialsvirtuallab / matgl

Graph deep learning library for materials
BSD 3-Clause "New" or "Revised" License
232 stars 57 forks source link

[Bug]: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #228

Closed alinelena closed 4 months ago

alinelena commented 4 months ago

Email (Optional)

alin@elena.re

Version

git

Which OS(es) are you using?

What happened?

running a simpler example with matgl on gpu seems not to be happy.. it is perfectly fine if I run cpu only.

also tried the variations of commented code with no success.. just diversity in the errors.

Code snippet

from ase import build
from matgl import __version__
from matgl import load_model
from matgl.ext.ase import M3GNetCalculator
import torch 

print(__version__)

#torch.set_default_device('cuda')
model = load_model("M3GNet-MP-2021.2.8-DIRECT-PES")
calculator = M3GNetCalculator(potential=model)
#calculator = M3GNetCalculator(potential=model.to("cuda"))

benzene = build.molecule('C6H6')
benzene.calc = calculator  

print(f"E_config= {benzene.get_potential_energy()} eV")

Log output

finishes with the error:

 python3 ./simple_m3gnet.py 
1.0.0
/work4/scd/scarf562/micromamba/micromamba/envs/mace/lib/python3.11/site-packages/matgl/apps/pes.py:69: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  self.element_refs = AtomRef(property_offset=torch.tensor(element_refs, dtype=matgl.float_th))
/work4/scd/scarf562/micromamba/micromamba/envs/mace/lib/python3.11/site-packages/matgl/apps/pes.py:75: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  self.register_buffer("data_mean", torch.tensor(data_mean, dtype=matgl.float_th))
/work4/scd/scarf562/micromamba/micromamba/envs/mace/lib/python3.11/site-packages/matgl/apps/pes.py:76: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  self.register_buffer("data_std", torch.tensor(data_std, dtype=matgl.float_th))
/work4/scd/scarf562/micromamba/micromamba/envs/mace/lib/python3.11/site-packages/matgl/layers/_basis.py:121: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  root = torch.tensor(roots[i])
Traceback (most recent call last):
  File "/work4/scd/scarf562/ml/lavello/tests/./simple_m3gnet.py", line 24, in <module>
    print(f"E_config= {benzene.get_potential_energy()} eV")
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/work4/scd/scarf562/micromamba/micromamba/envs/mace/lib/python3.11/site-packages/ase/atoms.py", line 755, in get_potential_energy
    energy = self._calc.get_potential_energy(self)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/work4/scd/scarf562/micromamba/micromamba/envs/mace/lib/python3.11/site-packages/ase/calculators/abc.py", line 24, in get_potential_energy
    return self.get_property(name, atoms)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/work4/scd/scarf562/micromamba/micromamba/envs/mace/lib/python3.11/site-packages/ase/calculators/calculator.py", line 537, in get_property
    self.calculate(atoms, [name], system_changes)
  File "/work4/scd/scarf562/micromamba/micromamba/envs/mace/lib/python3.11/site-packages/matgl/ext/ase.py", line 177, in calculate
    energies, forces, stresses, hessians = self.potential(graph, lattice, state_attr_default)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/work4/scd/scarf562/micromamba/micromamba/envs/mace/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/work4/scd/scarf562/micromamba/micromamba/envs/mace/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/work4/scd/scarf562/micromamba/micromamba/envs/mace/lib/python3.11/site-packages/matgl/apps/pes.py", line 120, in forward
    property_offset = torch.squeeze(self.element_refs(g))
                                    ^^^^^^^^^^^^^^^^^^^^
  File "/work4/scd/scarf562/micromamba/micromamba/envs/mace/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/work4/scd/scarf562/micromamba/micromamba/envs/mace/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/work4/scd/scarf562/micromamba/micromamba/envs/mace/lib/python3.11/site-packages/matgl/layers/_atom_ref.py", line 78, in forward
    offset = property_offset_batched * one_hot
             ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Code of Conduct

kenko911 commented 4 months ago

Hi @alinelena, thank you very much for reporting this issue. I did the test with your script including the following line:

from ase import build from matgl import version from matgl import load_model from matgl.ext.ase import PESCalculator import torch

print(version)

torch.set_default_device('cuda') model = load_model("M3GNet-MP-2021.2.8-DIRECT-PES") calculator = PESCalculator(potential=model)

calculator = M3GNetCalculator(potential=model.to("cuda"))

benzene = build.molecule('C6H6') benzene.calc = calculator

print(f"E_config= {benzene.get_potential_energy()} eV")

I am not able to reproduce your error and the following is my output.

1.0.0 /home/t1ko/miniconda3/envs/mavrl/lib/python3.9/site-packages/torch/utils/_device.py:77: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). return func(*args, *kwargs) /home/t1ko/miniconda3/envs/mavrl/lib/python3.9/site-packages/torch/utils/_device.py:77: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return func(args, **kwargs) E_config= -76.01427459716797 eV

Btw, it should be noted that the M3GNet-DIRECT potential is trained from the material project database and I believe that the performance for molecular systems will be less accurate.

alinelena commented 4 months ago

odd, thank you for testing I will look in another machine to see if I can reproduce it. this is just for testing so I am not for accurate results. I will keep you posted

alinelena commented 4 months ago

just one thing, this is cuda 12 and pythorch 2.2.1-nigtly so maybe something from there.

shyuep commented 4 months ago

@alinelena I would recommend not using pytorch nightly. We stipulate all the supported requirements in our repo. If you use those and there are issues, you can reopen this issue.

vsumaria commented 2 months ago

Setting torch.set_default_device('cuda') solved the issue for me, just FYI.