isayev / ASE_ANI

ANI-1 neural net potential with python interface (ASE)
MIT License
218 stars 56 forks source link

Problem excecuting ani_quicktest.py with Geforce RTX 3080 #38

Open TeruoHIRAKAWA opened 3 years ago

TeruoHIRAKAWA commented 3 years ago

Hello!

Thanks very much for this open-source project. It has been a great experience.

I would like to ask you fever. Recently, I have bought a new PC with one component, Nvidia Geforce RTX 3080. When I was running ani_quicktest.py after I installed ASE_ANI and related essential software, I got an error:

python ani_quicktest.py

ERROR: CUDA throw detected! Attempting to shut down nicely! CUDA Error -- "invalid device symbol" 13 in location -- /home/jujuman/Gits/NeuroChem/src-aevlib/cuda_aev/cuaev_compute.cu:1939 in function -- cuaev_compute_base()

Traceback (most recent call last): File "ani_quicktest.py", line 21, in mol.set_calculator(ANIENS(aniensloader('../ani_models/ani-1ccx_8x.info',0))) File "/home/micro/local/ASE_ANI/lib/ase_interface.py", line 1038, in aniensloader return ensemblemolecule(cnstfile, saefile, nnfdir, Nn, gpu) File "/home/micro/local/ASE_ANI/lib/ase_interface.py", line 477, in init self.ncl = [pync.molecule(cnstfile, saefile, nnfprefix + str(i+net_start_id) + '/networks/', 1, gpuid, sinet) for i in File "/home/micro/local/ASE_ANI/lib/ase_interface.py", line 477, in self.ncl = [pync.molecule(cnstfile, saefile, nnfprefix + str(i+net_start_id) + '/networks/', 1, gpuid, sinet) for i in RuntimeError: unidentifiable C++ exception

I assumed that it may be due to Nvidia Geforce RTX 3080 having a new device symbol, however, I could not solve the problem by myself.

Would you have some ideas to solve it? Any help would be much appreciated.

Let me inform you of an overview of the PC:

Operating system: Ubuntu Desktop 18.04.5 Nvidia driver version: Driver Version: 460.32.03 CUDA toolkit version: 9.2 Python version: 3.8.5

CPU: Intel(R) Core(TM) i9-10900F CPU @ 2.80GHz Memory: 64 GB GPU: Nvidia Geforce RTX 3080 (10GB)

Best regards,

Teruo.

isayev commented 3 years ago

Dear Teruo: Thanks for reporting that. CUDA 9 is way too old for RTX 3080. Could you please try CUDA10/Python 3.6 branch: https://github.com/isayev/ASE_ANI/tree/centos_cuda10_py36 If not, we probably need to update the code. Unfortunately, I do not have a 3 series card to test real quick.

Jussmith01 commented 3 years ago

This has to do with what compute architecture the binary was built for, which is CUDA compute 8.6 for the GTX 3080 I think. None of the current binaries are built for this, so it cannot be fixed without recompiling the code. I will try to upload a new set of binaries soon.

TeruoHIRAKAWA commented 3 years ago

Dear Prof. Isayev and Dr. Smith,

I truly appreciate your quick reply.

Firstly, I tried to use ASE_ANI on CUDA10/Python 3.6 Branch, however, I got the same error:

% python ani_quicktest.py

ERROR: CUDA throw detected! Attempting to shut down nicely! CUDA Error -- "invalid device symbol" 13 in location -- /home/jujuman/Gits/NeuroChem/src-aevlib/cuda_aev/cuaev_compute.cu:1939 in function -- cuaev_compute_base()

Traceback (most recent call last): File "ani_quicktest.py", line 21, in mol.set_calculator(ANIENS(aniensloader('../ani_models/ani-1ccx_8x.info',0))) File "/home/micro/local/ASE_ANI/lib/ase_interface.py", line 1038, in aniensloader return ensemblemolecule(cnstfile, saefile, nnfdir, Nn, gpu) File "/home/micro/local/ASE_ANI/lib/ase_interface.py", line 477, in init self.ncl = [pync.molecule(cnstfile, saefile, nnfprefix + str(i+net_start_id) + '/networks/', 1, gpuid, sinet) for i in File "/home/micro/local/ASE_ANI/lib/ase_interface.py", line 477, in self.ncl = [pync.molecule(cnstfile, saefile, nnfprefix + str(i+net_start_id) + '/networks/', 1, gpuid, sinet) for i in RuntimeError: unidentifiable C++ exception

Let me inform you of an overview of the PC:

Operating system: Ubuntu Desktop 18.04.5 Nvidia Driver Version: 460.32.03 CUDA toolkit version: 10.0 Python version: 3.8.5

CPU: Intel(R) Core(TM) i9-10900F CPU @ 2.80GHz Memory: 64 GB GPU: Nvidia Geforce RTX 3080 (10GB)

This has to do with what compute architecture the binary was built for, which is CUDA compute 8.6 for the GTX 3080 I think. None of the current binaries are built for this, so it cannot be fixed without recompiling the code. I will try to upload a new set of binaries soon.

Thank you very much for your prompt attention to this matter. That would really help me if it’s not too much trouble for you.

If there is anything else I could do to help you then please do ask, like an operation test for the recompiled ASE_ANI. I have bought two PCs, each of which has Geforce RTX 3080/3090.

Best regards,

turboresearcher commented 2 years ago

Dear @TeruoHIRAKAWA, have you solved the problem you described (related to usage of RTX3080 with CUDA10)? If so, I'd be grateful if you could share the details about it since I'm having the same trouble. Best regards!

aydinmirac commented 2 years ago

Hello,

I'm also having the same issue. I tried to run the code with different CUDA versions such as 9.2, 10.0 and 10.2. But I was not able to solve the issue. It always throws "ERROR: CUDA throw detected! Attempting to shut down nicely!"

Do you have any suggestion for this problem?

Thanks for your help. Best regards

isayev commented 2 years ago

Hey @MiracAydin1, this code is now legacy, as it was customarily compiled for specific CUDA and Nvidia architectures. I strongly encourage you to use TorchANI https://github.com/aiqm/torchani instead.