materialsproject / pymatgen

Python Materials Genomics (pymatgen) is a robust materials analysis code that defines classes for structures and molecules with support for many electronic structure codes. It powers the Materials Project.
https://pymatgen.org
Other
1.43k stars 840 forks source link

Default settings of `Structure.relax()` fails to synchronize tensor locations (CPU/GPU) on GPU-enabled environments #3715

Open jsukpark opened 3 months ago

jsukpark commented 3 months ago

Python version

Python 3.9.18

Pymatgen version

2023.12.18

Operating system version

Ubuntu 22.04.4 LTS

Current behavior

Running the relax() method of pymatgen.core.Structure object with default settings on a GPU-enabled environment raises RuntimeError, saying the tensors involved in computation are not on the same device.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/pymatgen/core/structure.py", line 4323, in relax
    return self._relax(
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/pymatgen/core/structure.py", line 776, in _relax
    dyn = opt_class(ecf, **opt_kwargs)
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/ase/optimize/fire.py", line 54, in __init__
    Optimizer.__init__(self, atoms, restart, logfile, trajectory,
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/ase/optimize/optimize.py", line 234, in __init__
    self.set_force_consistent()
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/ase/optimize/optimize.py", line 325, in set_force_consistent
    self.atoms.get_potential_energy(force_consistent=True)
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/ase/constraints.py", line 2420, in get_potential_energy
    atoms_energy = self.atoms.get_potential_energy(
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/ase/atoms.py", line 728, in get_potential_energy
    energy = self._calc.get_potential_energy(
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/ase/calculators/calculator.py", line 709, in get_potential_energy
    energy = self.get_property('energy', atoms)
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/ase/calculators/calculator.py", line 737, in get_property
    self.calculate(atoms, [name], system_changes)
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/matgl/ext/ase.py", line 177, in calculate
    energies, forces, stresses, hessians = self.potential(graph, lattice, state_attr_default)
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/matgl/apps/pes.py", line 120, in forward
    property_offset = torch.squeeze(self.element_refs(g))
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/miniconda3/envs/myenv/lib/python3.9/site-packages/matgl/layers/_atom_ref.py", line 78, in forward
    offset = property_offset_batched * one_hot
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Expected Behavior

The structural relaxation would run without error, with all intermediate tensors copied to/from GPU as needed to ensure all operations occur within the same device.

Minimal example

import numpy as np
from pymatgen.core import Structure

struct = Structure(  # diamond
    np.array([[0, 1, 1], [1, 0, 1], [1, 1, 0]]) * 1.786855,
    ['C'] * 2,
    np.array([[.25, .25, .25], [0.0, 0.0, 0.0]]),
)
struct.relax()  # uses default calculator 'm3gnet'


### Relevant files to reproduce this bug

_No response_
jsukpark commented 3 months ago

ADD: The matgl package installed is of version 1.0.0.