ulissigroup / amptorch

AMPtorch: Atomistic Machine Learning Package (AMP) - PyTorch
GNU General Public License v3.0
59 stars 35 forks source link

Torch version issues #130

Open jparas-3 opened 1 year ago

jparas-3 commented 1 year ago

This was a known issue last year with PyTorch version issues, but it was never written down permanently (only in Slack messages that have since been automatically deleted)

`(amptorch) [jparas7@login-ice-2 amptorch]$ python -m amptorch.tests.training_test
Traceback (most recent call last):
  File "/home/hice1/jparas7/.conda/envs/amptorch/lib/python3.9/runpy.py", line 188, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/home/hice1/jparas7/.conda/envs/amptorch/lib/python3.9/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "/home/hice1/jparas7/amptorch/amptorch/__init__.py", line 2, in <module>
    from .trainer import AtomsTrainer
  File "/home/hice1/jparas7/amptorch/amptorch/trainer.py", line 17, in <module>
    from amptorch.dataset import AtomsDataset, DataCollater, construct_descriptor
  File "/home/hice1/jparas7/amptorch/amptorch/dataset.py", line 2, in <module>
    from torch_geometric.data import Batch
  File "/home/hice1/jparas7/.conda/envs/amptorch/lib/python3.9/site-packages/torch_geometric/__init__.py", line 1, in <module>
    import torch_geometric.utils
  File "/home/hice1/jparas7/.conda/envs/amptorch/lib/python3.9/site-packages/torch_geometric/utils/__init__.py", line 3, in <module>
    from .scatter import scatter
  File "/home/hice1/jparas7/.conda/envs/amptorch/lib/python3.9/site-packages/torch_geometric/utils/scatter.py", line 7, in <module>
    import torch_geometric.typing
  File "/home/hice1/jparas7/.conda/envs/amptorch/lib/python3.9/site-packages/torch_geometric/typing.py", line 37, in <module>
    import torch_sparse  # noqa
  File "/home/hice1/jparas7/.conda/envs/amptorch/lib/python3.9/site-packages/torch_sparse/__init__.py", line 40, in <module>
    from .tensor import SparseTensor  # noqa
  File "/home/hice1/jparas7/.conda/envs/amptorch/lib/python3.9/site-packages/torch_sparse/tensor.py", line 13, in <module>
    class SparseTensor(object):
  File "/home/hice1/jparas7/.conda/envs/amptorch/lib/python3.9/site-packages/torch/jit/_script.py", line 1294, in script
    _compile_and_register_class(obj, _rcb, qualified_name)
  File "/home/hice1/jparas7/.conda/envs/amptorch/lib/python3.9/site-packages/torch/jit/_recursive.py", line 44, in _compile_and_register_class
    script_class = torch._C._jit_script_class_compile(qualified_name, ast, defaults, rcb)
RuntimeError:
object has no attribute sparse_csc_tensor:
  File "/home/hice1/jparas7/.conda/envs/amptorch/lib/python3.9/site-packages/torch_sparse/tensor.py", line 585
            value = torch.ones(self.nnz(), dtype=dtype, device=self.device())

        return torch.sparse_csc_tensor(colptr, row, value, self.sizes())
               ~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE`
nicoleyghu commented 1 year ago

Hello Jacob,

I've tested on my personal Linux work station and PACE-HIVE clusters that the env_cpu.yml file should work with unittest without the pytorch dependency issues. You may try pulling from the current master branch and creating a new virtual environment for amptorch. I will paste the commands that I used on PACE-HIVE after this message.

We discussed on Slack the other time and said that it's the pytorch dependency issues. I should have sent you an environment file for pip where the versions of all packages are pinned including the pytorch dependencies (sparse, geometric, etc.). This August I updated the environment files (env_cpu, env_gpu) and pinned the working versions of torch-sparse and -geometric which should offer better stability as to the issue occurred. I would recommend trying pulling and installing it again from fresh. One other issue might be with how PACE-HIVE clusters might be configured differently than PACE-ICE but I doubt this is the problem here.

Commands I used on PACE-HIVE:

`module load anaconda3

module load gcc

conda info --envs

conda remove --name amptorch --all

cd data/

mkdir amptorch_20231019

cd amptorch_20231019/

git clone https://github.com/ulissigroup/amptorch.git

cd amptorch/

conda env create -f env_cpu.yml

conda activate amptorch

pip install -e .

python -m unittest `

This should yield all tests passed:

(amptorch) [yhu459@login-hive-2 amptorch]$ python -m unittest training model with Cosine cutoff function Results saved to ./checkpoints/2023-10-19-12-56-38-test converting ASE atoms collection to Data objects: 100%|█| 100/100 [00:02<00:00, 4 Scaling Feature data (standardize): 100%|█| 100/100 [00:00<00:00, 4304.46 scalin Scaling Target data: 100%|██████████| 100/100 [00:00<00:00, 71465.39 scalings/s] Loading dataset: 100 images torch.float64 Use Xavier initialization torch.float64 Use Xavier initialization torch.float64 Use Xavier initialization Loading model: 5103 parameters Loading skorch trainer

<__array_function__ internals>:5: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray. Training completed in 14.211713552474976s E_MAE: 0.001362, F_MAE: 0.006183 training model with Polynomial cutoff function (gamma = 2.0) Results saved to ./checkpoints/2023-10-19-12-56-55-test converting ASE atoms collection to Data objects: 100%|█| 100/100 [00:01<00:00, 5 Scaling Feature data (standardize): 100%|█| 100/100 [00:00<00:00, 1165.82 scalin Scaling Target data: 100%|██████████| 100/100 [00:00<00:00, 82032.15 scalings/s] ... ---------------------------------------------------------------------- Ran 8 tests in 208.604s

OK

Hope this helps.

Best, Nicole

kavi9030 commented 7 months ago

Hello, I'm trying to install amptorch for CPU but running into several issues. If I use pytorch 1.10. I'm getting the error module 'torch.cuda' has no attribute '_UntypedStorage' . It suggests that this problem doesn't exist in the newer pytorch 1.13. But with pytorch=1.13. I'm getting a Segmentation fault . How can I debug this error?

nicoleyghu commented 7 months ago

Hi, The release of amptorch-cpu is meant for python 3.9 and pytorch 1.10, which shouldn't create a problem as I've just tested. Your issue seems to be with higher pytorch installation, which is a bit beyond the scope of this piece of software. If your goal is to test this module, I'd recommend going with the default versions. If you wish to upgrade the associated dependencies, more testing is needed.

My experience with solving the environment is trying installing pytorch with the most relevant python version, trying different combinations, and then working out the torch dependencies such as geometric, sparse, etc. All of these mean that the pinned versions in the amptorch-cpu's environment file no longer work the same.

Hope this helps.

Best regards, Nicole