CUDA error: out of memory

bestz123 commented 3 days ago

Expected Behavior

Hello, when I run mmseq_gpu on my own data set, this error is reported. Is there any command restriction to prevent it from reporting such an error?

Context

stdout: search /tmp/tmpzvdqhge1/query_DB /home/inspur/zyz/alphafold3/database/uniprot_all_2021_04_gpu/uni /tmp/tmpzvdqhge1/resultDB /tmp/tmp4r6v014n -a --alignment-mode 2 --min-aln-len 10 -s 8 -e 0.1 --max-seqs 10000 --gpu 1

MMseqs Version: 562a47f3a276721e40e63715474adf27747f1bfc Substitution matrix aa:blosum62.out,nucl:nucleotide.out Add backtrace true Alignment mode 2 Alignment mode 0 Allow wrapped scoring false E-value threshold 0.1 Seq. id. threshold 0 Min alignment length 10 Seq. id. mode 0 Alternative alignments 0 Coverage threshold 0 Coverage mode 0 Max sequence length 65535 Compositional bias 1 Compositional bias 1 Max reject 2147483647 Max accept 2147483647 Include identical seq. id. false Preload mode 0 Pseudo count a substitution:1.100,context:1.400 Pseudo count b substitution:4.100,context:5.800 Score bias 0 Realign hits false Realign score bias -0.2 Realign max seqs 2147483647 Correlation score weight 0 Gap open cost aa:11,nucl:5 Gap extension cost aa:1,nucl:2 Zdrop 40 Threads 128 Compressed 0 Verbosity 3 Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out Sensitivity 8 k-mer length 0 Target search mode 0 k-score seq:2147483647,prof:2147483647 Alphabet size aa:21,nucl:5 Max results per query 10000 Split database 0 Split mode 2 Split memory limit 0 Diagonal scoring true Exact k-mer matching 0 Mask residues 1 Mask residues probability 0.9 Mask lower case residues 0 Minimum diagonal score 15 Selected taxa
Spaced k-mers 1 Spaced k-mer pattern
Local temporary path
Use GPU 1 Use GPU server 0 Prefilter mode 0 Rescore mode 0 Remove hits by seq. id. and coverage false Sort results 0 Mask profile 1 Profile E-value threshold 0.1 Global sequence weighting false Allow deletions false Filter MSA 1 Use filter only at N seqs 0 Maximum seq. id. threshold 0.9 Minimum seq. id. 0.0 Minimum score per column -20 Minimum coverage 0 Select N most diverse seqs 1000 Pseudo count mode 0 Min codons in orf 30 Max codons in length 32734 Max orf gaps 2147483647 Contig start mode 2 Contig end mode 2 Orf start mode 1 Forward frames 1,2,3 Reverse frames 1,2,3 Translation table 1 Translate orf 0 Use all table starts false Offset of numeric ids 0 Create lookup 0 Add orf stop false Overlap between sequences 0 Sequence split mode 1 Header split mode 0 Chain overlapping alignments 0 Merge query 1 Search type 0 Search iterations 1 Start sensitivity 4 Search steps 1 Exhaustive search mode false Filter results during exhaustive search 0 Strand selection 1 LCA search mode false Disk space limit 0 MPI runner
Force restart with latest tmp false Remove temporary files false

ungappedprefilter /tmp/tmpzvdqhge1/query_DB /home/inspur/zyz/alphafold3/database/uniprot_all_2021_04_gpu/uni /tmp/tmpzvdqhge1/resultDB --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -c 0 -e 0.1 --cov-mode 0 --comp-bias-corr 1 --comp-bias-corr-scale 1 --min-ungapped-score 15 --max-seqs 10000 --db-load-mode 0 --gpu 1 --gpu-server 0 --prefilter-mode 3 --threads 128 --compressed 0 -v 3

CUDA error: out of memory : /home/vsts/work/1/s/lib/libmarv/src/cudasw4.cuh, line 1276 Error: Alignment died

stderr:

Your Environment

Include as many relevant details about the environment you experienced the bug in.

Git commit used (The string after "MMseqs Version:" when you execute MMseqs without any parameters): 562a47f3a276721e40e63715474adf27747f1bfc
Which MMseqs version was used (Statically-compiled, self-compiled, Homebrew, etc.): Statically-compiled
Server specifications (especially CPU support for AVX2/SSE and amount of system memory): AMD 9654, 768G，A100(80G)
Operating system and version: ubuntu 20.04

milot-mirdita commented 3 days ago

What GPU are you using and what size is the target database? Can you list all commands you ran please?

bestz123 commented 1 day ago

Hi @milot-mirdita， The relevant information has been updated. I used A100 (80G) to perform msa on the uniprot_all_2021_04.fa (101GB) dataset of af3.

milot-mirdita commented 1 day ago

CUDA error: out of memory : /home/vsts/work/1/s/lib/libmarv/src/cudasw4.cuh, line 1276

This is a very odd place for a crash. How did you compile MMseqs2? Can you try with the precompiled binary instead please: https://github.com/soedinglab/MMseqs2/releases/download/16-747c6/mmseqs-linux-gpu.tar.gz

bestz123 commented 8 hours ago

This is a very odd place for a crash. How did you compile MMseqs2? Can you try with the precompiled binary instead please: https://github.com/soedinglab/MMseqs2/releases/download/16-747c6/mmseqs-linux-gpu.tar.gz

Yes, I am using this precompiled version, and this error sometimes does not appear when I re-run it.

milot-mirdita commented 3 hours ago

Are the GPUs already in use by other processes? Can you try to explicitly set CUDA_VISIBLE_DEVICES to the GPU you want to use?

milot-mirdita commented 3 hours ago

Also can you check if you previously started a MMseqs2 GPU server and didn't clean it up (ps aux | grep gpuserver)?

bestz123 commented 32 minutes ago

Are the GPUs already in use by other processes? Can you try to explicitly set CUDA_VISIBLE_DEVICES to the GPU you want to use?

I have checked that the GPU is not occupied by other tasks. I use CUDA_VISIBLE_DEVICES=0

Also can you check if you previously started a MMseqs2 GPU server and didn't clean it up (ps aux | grep gpuserver)?

Before I start the MMseqs2 GPU server, the GPU memory usage is normal.

Could it be related to the length of my alignment sequence? The longer the sequence, the more likely it is to cause CUDA errors. Or could it be related to the mode I set? When I choose --num-iterations 3, this error often occurs.

soedinglab / MMseqs2