Open art-egorov opened 1 year ago
UPD: seems all failed proteins are forms of titin. (e.g. XP_035030256.2, XP_035030222.2, XP_045336327.1)
Does the crash also happen with a smaller max-seqs
(currently its set to --max-seqs 1000000
)?
Are the failed proteins on the query side? Do these queries also crash against a small DB (e.g. the DB.fasta in the examples folder)?
Yep, it also crashes wo --max-seqs parameter and search with these proteins does not crash with search against DB.fasta.
I'm running easy-search for a set of fasta files. For the majority of files everything is fine, for a small subset i'm getting the same error after prefiltering.
That's an example of my command:
../bin/mmseqs/bin/mmseqs easy-search s01_complete_refseq_representative_fasta_DEVIDED/mmseqs_rep_d_2.fa mmseqs/mmseqs_clu_rep_db/DB mmseqs_test.tsv tmp1 --format-mode 4 --num-iterations 5 -e 1e-5 --format-output query,target,fident,alnlen,mism atch,gapopen,qstart,qend,tstart,tend,evalue,bits --max-seqs 1000000 -s 6
MMseqs Output (for bugs)
Context
I thought maybe it's due to some special symbols in sequences in the failed fastas or larger size of proteins. Seems not, since "X" symbols where in completed fastas as well, as well as protein length ~30K or short. dividing these fasta files to a set of smaller solves this problem for a subset of new, but still leaves some with the same error. I can send an example fasta if it's needed.
Your Environment
Include as many relevant details about the environment you experienced the bug in.
Which MMseqs version was used: MMseqs2 Version: b22d5f6d02cb27ebc2cd931d8d20fe92ff54b8a8, got with wget, avx2 Node info: 2 x NVIDIA Tesla V100 SXM2 GPU with 32GB RAM, connected by nvlink 2 x 8 core Intel(R) Xeon(R) Gold 6244 CPU @ 3.60GHz (total 16 cores) 768GB DDR4 RAM 387GB SSD scratch disk
(however, run on different machines, the same problem)