isayevlab / AIMNet2

MIT License
96 stars 25 forks source link

RuntimeError - sparse_nb is empty #26

Closed andreas-albers closed 4 months ago

andreas-albers commented 4 months ago

Hey, thanks for the ongoing development of the AIMNet potentials!

While running a minimum example with the mol_single.xyz structure, I encountered a RuntimeError:

from aimnet2calc import AIMNet2ASE
import ase.io

atoms = ase.io.read(r'AIMNet2\test\mol_single.xyz')
calc = AIMNet2ASE('aimnet2')
atoms.calc = calc
atoms.get_potential_energy()
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
File c:\Users\Andreas\Documents\AIMNet2\test\my_test.py:9
      [7](file:///C:/Users/Andreas/Documents/AIMNet2/test/my_test.py:7) calc = AIMNet2ASE('aimnet2')
      [8](file:///C:/Users/Andreas/Documents/AIMNet2/test/my_test.py:8) atoms.calc = calc
----> [9](file:///C:/Users/Andreas/Documents/AIMNet2/test/my_test.py:9) atoms.get_potential_energy()

File c:\Users\Andreas\Anaconda3\envs\aimnet\lib\site-packages\ase\atoms.py:755, in Atoms.get_potential_energy(self, force_consistent, apply_constraint)
    [752](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/ase/atoms.py:752)     energy = self._calc.get_potential_energy(
    [753](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/ase/atoms.py:753)         self, force_consistent=force_consistent)
    [754](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/ase/atoms.py:754) else:
--> [755](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/ase/atoms.py:755)     energy = self._calc.get_potential_energy(self)
    [756](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/ase/atoms.py:756) if apply_constraint:
    [757](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/ase/atoms.py:757)     for constraint in self.constraints:

File c:\Users\Andreas\Anaconda3\envs\aimnet\lib\site-packages\ase\calculators\abc.py:24, in GetPropertiesMixin.get_potential_energy(self, atoms, force_consistent)
     [22](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/ase/calculators/abc.py:22) else:
     [23](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/ase/calculators/abc.py:23)     name = 'energy'
---> [24](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/ase/calculators/abc.py:24) return self.get_property(name, atoms)

File c:\Users\Andreas\Anaconda3\envs\aimnet\lib\site-packages\ase\calculators\calculator.py:538, in BaseCalculator.get_property(self, name, atoms, allow_calculation)
    [535](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/ase/calculators/calculator.py:535)     if self.use_cache:
    [536](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/ase/calculators/calculator.py:536)         self.atoms = atoms.copy()
--> [538](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/ase/calculators/calculator.py:538)     self.calculate(atoms, [name], system_changes)
    [540](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/ase/calculators/calculator.py:540) if name not in self.results:
    [541](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/ase/calculators/calculator.py:541)     # For some reason the calculator was not able to do what we want,
    [542](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/ase/calculators/calculator.py:542)     # and that is OK.
    [543](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/ase/calculators/calculator.py:543)     raise PropertyNotImplementedError(
    [544](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/ase/calculators/calculator.py:544)         '{} not present in this ' 'calculation'.format(name)
    [545](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/ase/calculators/calculator.py:545)     )

File c:\Users\Andreas\Anaconda3\envs\aimnet\lib\site-packages\aimnet2calc\aimnet2ase.py:65, in AIMNet2ASE.calculate(self, atoms, properties, system_changes)
     [62](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/aimnet2ase.py:62) else:
     [63](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/aimnet2ase.py:63)     cell = None
---> [65](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/aimnet2ase.py:65) results = self.base_calc({
     [66](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/aimnet2ase.py:66)     'coord': torch.tensor(self.atoms.positions, dtype=torch.float32, device=self.base_calc.device),
     [67](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/aimnet2ase.py:67)     'numbers': self._t_numbers,
     [68](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/aimnet2ase.py:68)     'cell': cell,
     [69](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/aimnet2ase.py:69)     'mol_idx': self._t_mol_idx,
     [70](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/aimnet2ase.py:70)     'charge': self._t_charge,
     [71](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/aimnet2ase.py:71)     'mult': self._t_mult,
     [72](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/aimnet2ase.py:72) }, forces='forces' in properties, stress='stress' in properties)
     [73](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/aimnet2ase.py:73) for k, v in results.items():
     [74](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/aimnet2ase.py:74)     results[k] = v.detach().cpu().numpy()

File c:\Users\Andreas\Anaconda3\envs\aimnet\lib\site-packages\aimnet2calc\calculator.py:59, in AIMNet2Calculator.__call__(self, *args, **kwargs)
     [58](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/calculator.py:58) def __call__(self, *args, **kwargs):
---> [59](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/calculator.py:59)     return self.eval(*args, **kwargs)

File c:\Users\Andreas\Anaconda3\envs\aimnet\lib\site-packages\aimnet2calc\calculator.py:79, in AIMNet2Calculator.eval(self, data, forces, stress, hessian)
     [78](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/calculator.py:78) def eval(self, data: Dict[str, Any], forces=False, stress=False, hessian=False) -> Dict[str, Tensor]:
---> [79](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/calculator.py:79)     data = self.prepare_input(data)
     [80](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/calculator.py:80)     if hessian and data['mol_idx'][-1] > 0:
     [81](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/calculator.py:81)         raise NotImplementedError('Hessian calculation is not supported for multiple molecules')

File c:\Users\Andreas\Anaconda3\envs\aimnet\lib\site-packages\aimnet2calc\calculator.py:98, in AIMNet2Calculator.prepare_input(self, data)
     [96](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/calculator.py:96)         print('Switching to DSF Coulomb for PBC')
     [97](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/calculator.py:97)         self.set_lrcoulomb_method('dsf')
---> [98](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/calculator.py:98) data = self.make_nbmat(data)
     [99](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/calculator.py:99) data = self.pad_input(data)
    [100](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/calculator.py:100) return data

File c:\Users\Andreas\Anaconda3\envs\aimnet\lib\site-packages\aimnet2calc\calculator.py:161, in AIMNet2Calculator.make_nbmat(self, data)
    [159](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/calculator.py:159) else:
    [160](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/calculator.py:160)     if 'nbmat' not in data:
--> [161](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/calculator.py:161)         data['nbmat'] = nblist_torch_cluster(data['coord'], self.cutoff, data['mol_idx'], max_nb=128)
    [162](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/calculator.py:162)         if self.lr:
    [163](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/calculator.py:163)             if 'nbmat_lr' not in data:

File c:\Users\Andreas\Anaconda3\envs\aimnet\lib\site-packages\aimnet2calc\nblist.py:37, in nblist_torch_cluster(coord, cutoff, mol_idx, max_nb)
     [35](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/nblist.py:35) sparse_nb = radius_graph(coord, batch=mol_idx, r=cutoff, max_num_neighbors=max_nb).to(torch.int32)
     [36](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/nblist.py:36) print(sparse_nb)
---> [37](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/nblist.py:37) max_num_neighbors = torch.unique(sparse_nb[0], return_counts=True)[1].max().item()
     [38](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/nblist.py:38) if max_num_neighbors < max_nb:
     [39](file:///C:/Users/Andreas/Anaconda3/envs/aimnet/lib/site-packages/aimnet2calc/nblist.py:39)     break

RuntimeError: max(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.

The tensor, which is returned by the radius_graph() function in nblist.py line 35 is empty, resulting in the error.

Running the same code entirely on CPU works fine. Any ideas what is causing this behavior? Or could this be a CUDA/GPU related problem of my setup?

Best, Andreas

ps: I'm using the following packages:

pytorch                   2.3.1           py3.10_cuda12.1_cudnn8_0    pytorch
pytorch-cuda              12.1                 hde6ce7c_5    pytorch
torch-cluster             1.6.3+pt23cpu            pypi_0    pypi
zubatyuk commented 4 months ago

This error occurs when no neighbors within the cutoff were found. Fixed in 9d006e1. Since this error did not happen with the coordinate file you referred to, and it does not depend on pytorch device used, I will ask you @andreas-albers to confirm the fix.

andreas-albers commented 4 months ago

I think you didn't mean to delte line 35 in 9d006e1, I reinserted it for testing. Catching the empty tensor will result in a diffrent RuntimeError downstream.

However, as you mentioned, the tensor shouldn't be empty in the first place (at least in the case of the above mentioned coordinates file). Everything works fine if I force pytorch to run on the CPU by hardcoding self.device = 'cpu' in this file. The error only occurs if cuda is utilized, which puzzles me.

zubatyuk commented 4 months ago

Hi @andreas-albers . Do I understand correctly, that the issue is not yet resolved? Please check your python environment. You have cuda version of pytorch and cpu version of pytorch-cluster. Can you install pytorch-cluster using pyg channel on anaconda?

andreas-albers commented 4 months ago

Hi @zubatyuk I checked diffrent versions of pytorch, pyrotch-cluster and cuda (including the install of pytorch-cluster using pyg channel on anaconda) and could not resolve the issue. However, since the issue is most likely related with my dependencies, I will close this issue.

zubatyuk commented 4 months ago

I did not notice that you use Windows. Try running inside WSL or another Linux.