Open manock opened 4 years ago
I never tried to cluster such long sequences. Can you isolate the issue?
The error is happening in the call for ksw_extz2_sse in BandedNucleotideAligner::align.
I have made a few tests with increasing number of sequences in the database. I tested up to 50 000 sequences and it went fine.
I have also done a test including the longer sequence and about 5000 other sequences and it went fine.
I also encountered segment fault issue when clustering long nucleotide sequences (up to 99 million bases). Does anyone have luck with long sequences?
==========Invalid database read for id=4294967295, database index=dump/9317603370475534640/input_step_redundancy.index getSeqLen: local id (4294967295) >= db size (8247802) =====================Error: Offset step died [===dump/16153251853230858118/linclust/13629425479186879042/linclust.sh: line 76: 195145 Segmentation fault (core dumped) $RUNNER "$MMSEQS" "${ALIGN_MODULE}" "$INPUT" "$INPUT" "$RESULTDB" "${TMP_PATH}/aln" ${ALIGNMENT_PAR} Error: Alignment step died Error: linclust died
Hello,
Expected Behavior
Output clustering results.
Current Behavior
Segmentation in linclust.sh
Steps to Reproduce (for bugs)
MMseqs Output (for bugs)
Please make sure to also post the complete output of MMseqs. You can use gist.github.com for large output.
Context
I have a Fasta with about 140000 sequences which range from a few thousands nucleotides to about 20 millions. The memory consumption is fine throughout the mmseqs steps. But at some point during the align phase, a segmentation fault is thrown. It doesn't seem like a memory problem. I tried with the easy-clust workflow and the cluster module, both of which fail at the same point.
Your Environment
Include as many relevant details about the environment you experienced the bug in.