Open ifiddes opened 3 months ago
The offending alignment is this:
113676 45 0.829 6.410E-05 39 0 40 527 566 585 39 0 0 584 27M1I12M1D
Which does appear to be walking off the end to me.
Could you please post the mmseqs command line and terminal output too? Ideally also the sequences with which to reproduce the crash
I am having a hard time creating a minimal reference sequence to reproduce the crash. If I reduce the target database down to only the aligned sequence, it doesn't happen.
The command line in question is
mmseqs convertali querydb targetdb --format-output query,target,qstart,qend,tstart,tend,raw,cigar,qaln,taln,qlen --search-type 3
I will continue to try and make a minimal reproducible example. I did notice that adding a N
to the start of my query sequence solves the issue.
I was unable to make a minimal ref, so I uploaded the ref to Box. It is a human and mouse transcriptome. I had to break it into three parts, just concatenate them.
Here is the query:
>GRCh38_ENSG00000103042.3491.40
TATTTTATTTTGTGTAGAGATGGGGTCTCACTAGGTTGCC
You should be able to reproduce the crash with
mmseqs easy-search tmp.fasta full_ref.fa aln.out $TMPDIR --format-output query,target,qstart,qend,tstart,tend,raw,qaln,taln,qlen --search-type 3
https://app.box.com/s/bx5y7s5gpa7ybyc6xera4hujwojagphe https://app.box.com/s/w86ynfly4gi2zt09wb0adqc3g05ox7ok https://app.box.com/s/g50mq3skkaimb8ggunwlqwgbdz5psb6t
GDB showed me I get a segmentation fault here
With
offset = 39
, andseqPos = 40
, andisReverseStrand = true
, the line of code is walking off the start of this 40bp long sequence.This seems to be because the backtrace has a length of 41:
I have not yet been able to figure out what the target sequence is to make a minimal reproducible example, but I wanted to see if you had any ideas on what would be causing this walk off the edge behavior.