Closed QuantumMisaka closed 1 year ago
Hi @QuantumMisaka ,
You should install PyTorch 1.11.0 with CUDA 11.*, see https://pytorch.org/get-started/previous-versions/#v1110.
Hi @Linux-cpp-lisp
I know that the problem mainly lies in CUDA and the CUDA version should be 11.*, however there is still something strange. I directly used pip install <source-code>
to install from source-code to a newly-created conda-based python env, from the output infomation, cudatoolkit-11.7 was installed, but the error was still there after I did pip install --upgrade torch
Will the problem lies in the process of pip install
?
What is
For what its worth, I always install PyTorch itself with conda
rather than pip
, but not necessarily because pip
is wrong...
PyTorch-1.12.1 with cudatoolkit-11.6 installed by conda
can run NequIP successfully.
This problem seems to be in pip
installation
Great, glad to hear!
Please note that we generally only recommend PyTorch 1.11 right now due to the PyTorch bug described in: https://github.com/mir-group/nequip/discussions/311#discussioncomment-5231630.
If you see behavior like this it can be resolved by switching to 1.11; if you don't please consider commenting on that issue so we can get a better sense of the scope of this problem and how it might be mitigated. Thanks!
Describe the bug When I use nequip, which is installed by pip, in A100 machine, error will occur:
To Reproduce run
nequip-train config/minimal.yaml
on A100 machineExpected behavior Properly running config/minimal.yaml and config/example.yaml
Environment (please complete the following information):
Additional context when I tried:
pip install --upgrade torch
to update torch to 2.0.0, problem seems to be solved and example running properly