Closed avilella closed 3 years ago
Reading a stackoverflow ticket, I tried to symlink these files and it now seems to work: https://stackoverflow.com/a/67642774/719016 In my case, where I've put the miniconda3 in /data, I did:
ln -s /data/miniconda3/envs/alphafold/lib/libcusolver.so.10 /data/miniconda3/envs/alphafold/lib/libcusolver.so.11
Had the same problem, and as per the linked stackoverflow answer, the issue is a deficiency in cudatoolkit 11.0, which the instructions here have you install. The problem doesn't appear if there's a newer system-wide install of cuda which includes a libcusolver.so.11
. So if you have a system install of cuda 11.3, as per the README, you won't have this problem. On my machine in question, the system install was cuda 10.2, hence the missing libcusolver.so.11. The symlink solves this nicely.
So I think it would be good if this workaround could be added to the README.
Side note, to quickly test if you'll run into this problem, just run the following (in your alphafold conda env):
import tensorflow as tf
tf.test.is_gpu_available()
This will immediately report if there's a failure opening the libcusolver.so.11, without having to wait for the jackhmmr search.
We've now installed alphafold_non_docker on a Linux system with an NVIDIA Quadro P1000 (4GB) but the system also has a 2GB NVIDIA card that appears as device 0 in
nvidia-smi
.When attempting to use the bash script with
-a 1
, it actually used the smaller card and runs out of memory, which is expected for the input protein which peaks at 3Gb of RAM in another computer where this works successfully.When attempting without the
-a
flag, or with the-a 0
flag, then it runs on the 4Gb device, which is listed as device 1 innvidia-smi
. It runs for a while, but at the prediction step, it crashes with this error:This is with the usual
sudo apt-get install nvidia-drivers-460
plussudo apt-get install nvidia-cuda-toolkit
method. Rebooting and sorting out the 'Secure Boot' malarkey was needed for this laptop.EDIT: just to make sure that the smaller card wasn't a problem, we attempted to take the smaller card off the computer and reboot. Only the larger 4Gb card appeared in the list in
nvidia-smi
, however, he issue remained as described above when trying to run alphafold.Any ideas what this libcusolver issue could be due to?