Open jparas-3 opened 1 year ago
Hello Jacob,
I've tested on my personal Linux work station and PACE-HIVE clusters that the env_cpu.yml file should work with unittest without the pytorch dependency issues. You may try pulling from the current master branch and creating a new virtual environment for amptorch. I will paste the commands that I used on PACE-HIVE after this message.
We discussed on Slack the other time and said that it's the pytorch dependency issues. I should have sent you an environment file for pip where the versions of all packages are pinned including the pytorch dependencies (sparse, geometric, etc.). This August I updated the environment files (env_cpu, env_gpu) and pinned the working versions of torch-sparse and -geometric which should offer better stability as to the issue occurred. I would recommend trying pulling and installing it again from fresh. One other issue might be with how PACE-HIVE clusters might be configured differently than PACE-ICE but I doubt this is the problem here.
Commands I used on PACE-HIVE:
`module load anaconda3
module load gcc
conda info --envs
conda remove --name amptorch --all
cd data/
mkdir amptorch_20231019
cd amptorch_20231019/
git clone https://github.com/ulissigroup/amptorch.git
cd amptorch/
conda env create -f env_cpu.yml
conda activate amptorch
pip install -e .
python -m unittest `
This should yield all tests passed:
(amptorch) [yhu459@login-hive-2 amptorch]$ python -m unittest training model with Cosine cutoff function Results saved to ./checkpoints/2023-10-19-12-56-38-test converting ASE atoms collection to Data objects: 100%|█| 100/100 [00:02<00:00, 4 Scaling Feature data (standardize): 100%|█| 100/100 [00:00<00:00, 4304.46 scalin Scaling Target data: 100%|██████████| 100/100 [00:00<00:00, 71465.39 scalings/s] Loading dataset: 100 images torch.float64 Use Xavier initialization torch.float64 Use Xavier initialization torch.float64 Use Xavier initialization Loading model: 5103 parameters Loading skorch trainer
<__array_function__ internals>:5: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray. Training completed in 14.211713552474976s E_MAE: 0.001362, F_MAE: 0.006183 training model with Polynomial cutoff function (gamma = 2.0) Results saved to ./checkpoints/2023-10-19-12-56-55-test converting ASE atoms collection to Data objects: 100%|█| 100/100 [00:01<00:00, 5 Scaling Feature data (standardize): 100%|█| 100/100 [00:00<00:00, 1165.82 scalin Scaling Target data: 100%|██████████| 100/100 [00:00<00:00, 82032.15 scalings/s] ... ---------------------------------------------------------------------- Ran 8 tests in 208.604s
OK
Hope this helps.
Best, Nicole
Hello,
I'm trying to install amptorch for CPU but running into several issues. If I use pytorch 1.10. I'm getting the error module 'torch.cuda' has no attribute '_UntypedStorage'
. It suggests that this problem doesn't exist in the newer pytorch 1.13. But with pytorch=1.13. I'm getting a Segmentation fault
. How can I debug this error?
Hi, The release of amptorch-cpu is meant for python 3.9 and pytorch 1.10, which shouldn't create a problem as I've just tested. Your issue seems to be with higher pytorch installation, which is a bit beyond the scope of this piece of software. If your goal is to test this module, I'd recommend going with the default versions. If you wish to upgrade the associated dependencies, more testing is needed.
My experience with solving the environment is trying installing pytorch with the most relevant python version, trying different combinations, and then working out the torch dependencies such as geometric, sparse, etc. All of these mean that the pinned versions in the amptorch-cpu's environment file no longer work the same.
Hope this helps.
Best regards, Nicole
This was a known issue last year with PyTorch version issues, but it was never written down permanently (only in Slack messages that have since been automatically deleted)