GPU flag not recognized in createdb/easysearch for ProstT5

rakeshr10 commented 3 weeks ago

@milot-mirdita For the ProstT5 command createdb/easy-search --prostt5-model weights --gpu 1. The createdb and easy-search command don't seem to recognize --gpu 1 flag.

I have compiled foldseek with cuda library and the foldseek executable was generated, but the gpu flag does not seem to be recognized.

milot-mirdita commented 3 weeks ago

If the --gpu flag is not present then most likely CUDA was not recognized properly during compilation.

Could you please post the output of cmake (not make, that will mostly just be spam).

rakeshr10 commented 3 weeks ago

I am attaching the cmake build folder here. The command I used for compilation was

build.tar.gz

milot-mirdita commented 3 weeks ago

Whoops, my instructions in the release notes where missing one part: -DENABLE_CUDA=1

cmake -DCMAKE_BUILD_TYPE=Release -DENABLE_CUDA=1 -DCUDAToolkit_ROOT=Path-To-Cuda-Toolkit

rakeshr10 commented 2 weeks ago

I was able to compile and the GPU flag is available now. When I use the --gpu 1 flag I get Segmentation fault (core dumped).

If I don't use the GPU flag it seems to run fine. I ran the foldseek createdb command this way.

foldseek createdb fastafile dbname --prostt5-model weights --gpu 1

martin-steinegger commented 2 weeks ago

Could you please give some more information regarding the system you are running it on?

rakeshr10 commented 2 weeks ago

It is CUDA Version: 12.0 with A100 GPU and Driver Version: 525.105.17. OS is Ubuntu 18.04.6 LTS and x86_64 architecture.

milot-mirdita commented 2 weeks ago

Could you run the following please and paste the crash backtrace here:

cd build
cmake -DCMAKE_BUILD_TYPE=Debug -DENABLE_CUDA=1 -DCUDAToolkit_ROOT=Path-To-Cuda-Toolkit ..
make -j32
gdb --args ./src/foldseek  createdb fastafile dbname --prostt5-model weights --gpu 1 
# wait for a prompt to appear
r
# wait for the crash
bt

rakeshr10 commented 2 weeks ago

MMseqs Version:         GITDIR-NOTFOUND
Use GPU                 1
Path to ProstT5         weights/
Chain name mode         0
Write mapping file      0
Mask b-factor threshold 0
Coord store mode        2
Write lookup file       1
Input format            0
File Inclusion Regex    .*
File Exclusion Regex    ^$
Threads                 128
Verbosity               3

Converting sequences
[1112] 13s 0mss
Time for merging to db_h: 0h 0m 16s 499ms
Time for merging to db: 0h 0m 16s 399ms
Database type: Aminoacid
[New Thread 0x7fffc5e3c700 (LWP 20672)]
[New Thread 0x7fffc563b700 (LWP 20673)]
[New Thread 0x7fffc4e3a700 (LWP 20674)]
[New Thread 0x7ffebe358700 (LWP 20675)]
[New Thread 0x7ffebdb57700 (LWP 20676)]

Thread 1 "foldseek" received signal SIGSEGV, Segmentation fault.
0x0000555555c05524 in core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once ()
(gdb) bt
#0  0x0000555555c05524 in core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once ()
#1  0x000055555600c73a in <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter ()
#2  0x0000555555c083a9 in prostt5::ProstT5::predict ()
#3  0x0000555555c02f16 in prostt5_predict_slice ()
#4  0x00005555555d96c3 in structcreatedb(int, char const**, Command const&) [clone ._omp_fn.1] ()
#5  0x00007fffe444fedf in GOMP_parallel () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#6  0x00005555555dd05a in structcreatedb(int, char const**, Command const&) ()
#7  0x000055555578da8b in runCommand(Command const*, int, char const**) ()
#8  0x00005555555b2872 in main ()

milot-mirdita commented 2 weeks ago

That's not a crash in cuda/gpu code. You said the same FASTA file was working using CPU, right?

It looks like something is wrong with the sequences/fasta file. Can you share the fasta file?

Edit: actually thinking more about it, it might be a crash in cuda.

Could you please install CUDA through conda to make sure that versions are not the issue:

conda create -n foldseek-prostt5 -c conda-forge cmake cuda-nvcc libcurand-dev libcublas-dev cuda-nvrtc-dev cuda-version=12.4
conda activate foldseek-prostt5
cmake -DCMAKE_BUILD_TYPE=Debug -DENABLE_CUDA=1 -DCUDAToolkit_ROOT=$(dirname $(which nvcc))/../targets/x86_64-linux ..

rakeshr10 commented 2 weeks ago

These are the files in the model directory.

rakeshr10 commented 2 weeks ago

Same error as previously for the conda library compiled Foldseek GPU version with the test file you provided. I also noticed while using the nvidia-smi command that the GPU utilization and its memory usage increases briefly before crashing.

milot-mirdita commented 2 weeks ago

I pushed changes that should immediately fail and print an error message if prostt5 cannot be loaded. Could you please pull the latest change and rerun? Hopefully that will help me find out what's going on.

rakeshr10 commented 2 weeks ago

I am getting an error while compiling now.

/opt/conda/envs/foldseek-prostt5/bin/../lib/gcc/x86_64-conda-linux-gnu/12.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: cannot find -lblock-aligner-c: No such file or directory
collect2: error: ld returned 1 exit status
src/CMakeFiles/foldseek.dir/build.make:121: recipe for target 'src/foldseek' failed
make[2]: *** [src/foldseek] Error 1
CMakeFiles/Makefile2:1427: recipe for target 'src/CMakeFiles/foldseek.dir/all' failed
make[1]: *** [src/CMakeFiles/foldseek.dir/all] Error 2
Makefile:135: recipe for target 'all' failed
make: *** [all] Error 2

milot-mirdita commented 2 weeks ago

That looks like an unrelated error. Please delete the build folder and try again.

szimmerman92 commented 2 weeks ago

I am getting a very similar error when I compile from source during the make command

/usr/bin/ld: cannot find -lblock-aligner-c collect2: error: ld returned 1 exit status make[2]: [src/CMakeFiles/foldseek.dir/build.make:123: src/foldseek] Error 1 make[1]: [CMakeFiles/Makefile2:1428: src/CMakeFiles/foldseek.dir/all] Error 2 make: *** [Makefile:136: all] Error 2

matchy233 commented 2 weeks ago

@szimmerman92 @rakeshr10 The new linking error you got (ld cannot find lblock-aligner-c) was not related to the current issue. It's actually introduced by Rust 1.79 -- it will replace dashes with underscores in library names, and unfortunately block-aligner-c is a library with dashes in its name. This is probably something that should be discussed in another issue.

A temporary workaround could be changing block-aligner-c to block_aligner_c in line 16 of src/CMakeLists.txt, or downgrade your Rust to < 1.79.

https://github.com/steineggerlab/foldseek/blob/7cd893937965e5bafffb29dc0048bf939a6eb5b0/src/CMakeLists.txt#L16

rakeshr10 commented 2 weeks ago

I tried your suggestion by editing the CMakeLists file in src folder did n't work though, got the same error.

/opt/conda/envs/foldseek-prostt5/bin/../lib/gcc/x86_64-conda-linux-gnu/12.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: cannot find -lblock_alinger_c: No such file or directory
collect2: error: ld returned 1 exit status
src/CMakeFiles/foldseek.dir/build.make:121: recipe for target 'src/foldseek' failed
make[2]: *** [src/foldseek] Error 1
CMakeFiles/Makefile2:1427: recipe for target 'src/CMakeFiles/foldseek.dir/all' failed
make[1]: *** [src/CMakeFiles/foldseek.dir/all] Error 2
Makefile:135: recipe for target 'all' failed
make: *** [all] Error 2

milot-mirdita commented 2 weeks ago

Interesting, @matchy233 corrosion is pinned (as part of a subtree). So I guess I should update to the latest 0.5.0 release to be consistent?

matchy233 commented 2 weeks ago

@rakeshr10 sorry I made a typo in my previous comment (edited just now) 😂 please modify "block-aligner-c" into "block_aligner_c" instead of "block_alinger_c" and see if it works this time.

@milot-mirdita I think so, and also src/CMakeLists.txt needs to be modified accordingly.

rakeshr10 commented 2 weeks ago

This is the current error.

Reading symbols from ../../foldseek/buildnew/src/foldseek...done.
(gdb) r
Starting program: foldseek/buildnew/src/foldseek createdb test.fasta test_db --prostt5-model weights --gpu 1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
createdb test.fasta test_db --prostt5-model weights --gpu 1 

MMseqs Version:         7cd893937965e5bafffb29dc0048bf939a6eb5b0
Use GPU                 1
Path to ProstT5         weights
Chain name mode         0
Write mapping file      0
Mask b-factor threshold 0
Coord store mode        2
Write lookup file       1
Input format            0
File Inclusion Regex    .*
File Exclusion Regex    ^$
Threads                 128
Verbosity               3

Converting sequences
[405] 9s 660ms
Time for merging to test_db_h: 0h 0m 11s 876ms
Time for merging to test_db: 0h 0m 11s 814ms
Database type: Aminoacid
[New Thread 0x7fffc6683700 (LWP 29796)]
[New Thread 0x7fffc5e82700 (LWP 29797)]
[New Thread 0x7fffc5681700 (LWP 29798)]
[New Thread 0x7ffebeb9f700 (LWP 29799)]
[New Thread 0x7ffebe39e700 (LWP 29800)]
Error loading ProstT5: DriverError(CUDA_ERROR_UNSUPPORTED_PTX_VERSION, "the provided PTX was compiled with an unsupported toolchain.") when loading cast_f32_f16
Time for processing: 0h 1m 5s 602ms
[Thread 0x7ffebe39e700 (LWP 29800) exited]
[Thread 0x7ffebeb9f700 (LWP 29799) exited]
[Thread 0x7fffc5681700 (LWP 29798) exited]
[Thread 0x7fffc5e82700 (LWP 29797) exited]
[Thread 0x7fffc6683700 (LWP 29796) exited]
[Inferior 1 (process 29792) exited with code 01]
(gdb) bt
No stack.

milot-mirdita commented 2 weeks ago

Error loading ProstT5: DriverError(CUDA_ERROR_UNSUPPORTED_PTX_VERSION, "the provided PTX was compiled with an unsupported toolchain.") when loading cast_f32_f16

Okay this is interesting. This sounds like you need to update the GPU driver (not CUDA). I'll try to get some Docker images compiled with a variety of CUDA versions, that should still be downwards compatible with older driver versions.

I'll try to get the other unrelated rust issue also fixed tomorrow.

rakeshr10 commented 2 weeks ago

Thanks. I was running this in a pod using argo workflow so cannot update the driver. I was also trying to make Docker image but it throws up this error when running through the CI/CD pipeline.

This is the docker file I was using.

FROM nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04
LABEL org.opencontainers.image.source = "https://github.com/steineggerlab/foldseek"
RUN apt-get update && apt-get install -y wget cmake curl
RUN wget -P /tmp \
    "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh" \
    && bash /tmp/Miniforge3-Linux-x86_64.sh -b -p /opt/conda \
    && rm /tmp/Miniforge3-Linux-x86_64.sh
ENV PATH /opt/conda/bin:$PATH
# installing into the base environment since the docker container wont do anything other than run openfold
RUN mamba create -n foldseek-prostt5 -c conda-forge cmake cuda-nvcc libcurand-dev libcublas-dev cuda-nvrtc-dev cuda-version=12.4
RUN export LD_LIBRARY_PATH=${CONDA_PREFIX}/lib:${LD_LIBRARY_PATH}
RUN echo "source activate foldseek-prostt5" > ~/.bashrc
RUN curl https://sh.rustup.rs -sSf | sh -s -- -y
ENV PATH="/root/.cargo/bin:${PATH}"
# Copy project files into container and set working directory
COPY . /app/foldseek
WORKDIR /app/foldseek
RUN mkdir build
WORKDIR /app/foldseek/build
RUN cmake -DCMAKE_BUILD_TYPE=Debug -DENABLE_CUDA=1 -DCUDAToolkit_ROOT=$(dirname $(which nvcc))/../targets/x86_64-linux ..
RUN make install -j 64

Error

#17 85.24 [ 85%] Building CXX object lib/mmseqs/src/CMakeFiles/mmseqs-framework.dir/workflow/EasyCluster.cpp.o
#17 85.55 error: failed to run custom build command for `candle-kernels v0.5.0`
#17 85.55 
#17 85.55 Caused by:
#17 85.55   process didn't exit successfully: `/app/foldseek/build/./cargo/build/debug/build/candle-kernels-60f843d602579bb9/build-script-build` (exit status: 101)
#17 85.55   --- stdout
#17 85.55   cargo:rerun-if-changed=build.rs
#17 85.55   cargo:rerun-if-changed=src/compatibility.cuh
#17 85.55   cargo:rerun-if-changed=src/cuda_utils.cuh
#17 85.55   cargo:rerun-if-changed=src/binary_op_macros.cuh
#17 85.55   cargo:info=["/usr", "/usr/local/cuda", "/opt/cuda", "/usr/lib/cuda", "C:/Program Files/NVIDIA GPU Computing Toolkit", "C:/CUDA"]
#17 85.55   cargo:rerun-if-env-changed=CUDA_COMPUTE_CAP
#17 85.55 
#17 85.55   --- stderr
#17 85.55   thread 'main' panicked at /app/foldseek/lib/prostt5/c/vendor/bindgen_cuda/src/lib.rs:489:18:
#17 85.55   `nvidia-smi` failed. Ensure that you have CUDA installed and that `nvidia-smi` is in your PATH.: Os { code: 2, kind: NotFound, message: "No such file or directory" }
#17 85.55   note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
#17 85.56 warning: build failed, waiting for other jobs to finish...
#17 89.93 [ 85%] Building CXX object lib/mmseqs/src/CMakeFiles/mmseqs-framework.dir/workflow/EasyLinclust.cpp.o
#17 90.55 [ 85%] Building CXX object lib/mmseqs/src/CMakeFiles/mmseqs-framework.dir/workflow/Enrich.cpp.o
#17 91.62 [ 85%] Building CXX object lib/mmseqs/src/CMakeFiles/mmseqs-framework.dir/workflow/Linsearch.cpp.o
#17 92.04 [ 86%] Building CXX object lib/mmseqs/src/CMakeFiles/mmseqs-framework.dir/workflow/Map.cpp.o
#17 92.15 [ 86%] Building CXX object lib/mmseqs/src/CMakeFiles/mmseqs-framework.dir/workflow/Rbh.cpp.o
#17 92.19 [ 86%] Building CXX object lib/mmseqs/src/CMakeFiles/mmseqs-framework.dir/workflow/Search.cpp.o
#17 93.37 [ 87%] Building CXX object lib/mmseqs/src/CMakeFiles/mmseqs-framework.dir/workflow/Taxonomy.cpp.o
#17 93.84 [ 87%] Building CXX object lib/mmseqs/src/CMakeFiles/mmseqs-framework.dir/workflow/EasyTaxonomy.cpp.o
#17 94.25 [ 87%] Building CXX object lib/mmseqs/src/CMakeFiles/mmseqs-framework.dir/workflow/CreateIndex.cpp.o
#17 94.62 [ 88%] Building CXX object lib/mmseqs/src/CMakeFiles/mmseqs-framework.dir/MMseqsBase.cpp.o
#17 96.73 make[2]: *** [CMakeFiles/_cargo-build_cprostt5.dir/build.make:70: CMakeFiles/_cargo-build_cprostt5] Error 101
#17 96.73 make[1]: *** [CMakeFiles/Makefile2:834: CMakeFiles/_cargo-build_cprostt5.dir/all] Error 2
#17 96.73 make[1]: *** Waiting for unfinished jobs....
#17 115.9 [ 88%] Linking CXX static library libgemmiwrapper.a
#17 116.2 [ 88%] Built target gemmiwrapper
#17 121.0 [ 88%] Linking CXX static library libmmseqs-framework.a
#17 122.6 [ 88%] Built target mmseqs-framework
#17 122.6 make: *** [Makefile:136: all] Error 2
#17 ERROR: process "/bin/sh -c make install -j 64" did not complete successfully: exit code: 2
------
 > importing cache manifest from registry.kf.research.corteva.com/rakesh/foldseek:
------
------
 > [13/13] RUN make install -j 64:
------
process "/bin/sh -c make install -j 64" did not complete successfully: exit code: 2

Running after_script
00:00
Running after script...
$ docker logout ${REPO}
Removing login credentials for registry.kf.research.corteva.com

Cleaning up project directory and file based variables
00:00
ERROR: Job failed: exit code 1

milot-mirdita commented 2 weeks ago

That's the unrelated rust error above, you need to NOT use rust 1.79.

rakeshr10 commented 2 weeks ago

How to specify the version in rust < 1.79 while installing cargo and rust

milot-mirdita commented 2 weeks ago

I fixed the block-aligner issue that was broken by rust 1.79. you should be able to compile again. I also made a docker file that should create a minimum sized container. However, we found another big issue with our candle based implementation. You do need to set the compute capability of your GPU in the docker file.

We are evaluating if we can move to a different inference framework and drop candle again.

In the meantime please feel free to play with the dockerfile below: Dockerfile.gpu.txt

rakeshr10 commented 2 weeks ago

Thanks @milot-mirdita. The Dockerfile based installation worked, though like you mentioned I had to set compute capability for different GPUs like A100 or V100. I also did n't have any issues with non-standard aa characters like X.

Can you let me know about this 'We are evaluating if we can move to a different inference framework and drop candle again.' Is there any issue with the candle implementation in terms of the accuracy of prediction and results?

Another question is there a way that the weights could be pre-loaded before runtime into the executable while making the docker image. Would save compute time while running on large number of files using different pods such that the weights need not be loaded every time a new pod is initialized.

milot-mirdita commented 2 weeks ago

The currently implementation works correctly.

However, this thread opened my eyes how much of a maintenance dead-end the current candle based version is.

We can't do static CUDA builds with the current candle based implementation and we need to do separate builds for each compute capability string.

I knew already that the former would continue to be an issue for us, but I only realized today that one has to set compute capability at compile time.

Additionally, this will be even worse since we likely want to build for multiple cuda versions to target different driver versions (ie 11.8 and 12.x).

This will basically make it impossible to package foldseek with GPU support within conda-forge/bioconda in the future.

So yeah, I am looking at a different ML framework that we can integrate for inference that should hopefully have neither issue.

milot-mirdita commented 2 weeks ago

You can do the foldseek databases call within the dockerfile to download the weights into the container.

I personally would prefer to have small container images and mount weights into the container, but you can do whatever you prefer.

rakeshr10 commented 2 weeks ago

@milot-mirdita Is there a limit to the size of sequence files in terms of number of sequences which can be used for prediction. I get this error after running for sometime. The sequence file had approx 42 Million sequences.

ProstT5 prediction error: DriverError(CUDA_ERROR_OUT_OF_MEMORY, "out of memory") [Prediction failed

steineggerlab / foldseek

GPU flag not recognized in createdb/easysearch for ProstT5 #285