Open mycprotein opened 1 year ago
I have checked that I use torch==1.12.0.
I had the exact same problem, hope it will be fixed soon, thanks.
Since for release 1.12.0, pytorch has two version: with +cpu or not. You may download torch-1.12.0+cpu from https://download.pytorch.org/whl/cpu/torch/ and it can work.
Thanks for replying @zxd1997066
Actually we are using pytorch with cpu version from OneAPI docker images, say:
docker pull intel/oneapi-aikit:2022.3.0-devel-ubuntu18.04
So I can gurantee you that we are using a CPU version Pytorch.
In the pictures above, you show that it works fine in single node, but I suggest you try to run it distributedly, that is, to run it by using the MPI tool set, just as illustrated in the Github homepage, using the command like:
mpirun -n <N> -ppn <PPN> -f <hostfile> python example.py
Me and my colleagues got the same segmentation fault error. Could you please have a try?
Thank you very much!
I am sorry, I can not reproduce it.
Hello, I have tested with docker pull intel/oneapi-aikit:2022.3.1-devel-ubuntu18.04
. The same error accured.
I am sorry, I can not reproduce it.
Hi bro, which docker base image are you using? I have tried with oneapi-aikit base image but it reported error.
I am sorry, I can not reproduce it.
Hi bro, which docker base image are you using? I have tried with oneapi-aikit base image but it reported error.
I just run it in conda, so I guess maybe there is something wrong with that docker, and this needs further investigate.
I have settled the environment like this:
sudo apt install openmpi-common openmpi-bin
pip install torch == 1.12
pip install oneccl_bind_pt==1.12.0 -f https://developer.intel.com/ipex-whl-stable
source <oneccl_bindings_for_pytorch_path>/env/setvars.sh
Then I execute the demo.py file usingpython demo.py
or using mpirun, but in both way I get a segmentation fault. I think maybe there are some misuses, but I cannot find detailed documents. I wonder how can I make it work?