maickrau / GraphAligner

MIT License
256 stars 30 forks source link

length_error #46

Closed ptrebert closed 2 years ago

ptrebert commented 3 years ago

Hi Mikko,

what does that error mean?

Align
terminate called after throwing an instance of 'std::length_error'
  what():  basic_string::_M_replace_aux
GraphAligner Branch master commit 02c8e2628bba16425dc58cdf67199319f0a7a304 2021-04-23 09:06:36 -0400

+Peter

maickrau commented 3 years ago

Can you post the command you used?

ptrebert commented 3 years ago

sure...

GraphAligner -g GRAPH -f READS \
-x vg -t 12 \
--min-alignment-score 5000 --multimap-score-fraction 1 \
--corrected-out READS_OUT \
-a ALN_OUT &> LOG
maickrau commented 3 years ago

Could you post the graph and the reads as well? The error is some kind of standard library string error

ptrebert commented 3 years ago

ok, I am going to add this to the Globus share (same as last time); I'll trigger via this issue when the data are available

ptrebert commented 3 years ago

@maickrau I have finally added the data behind this error message to the Globus share...

ptrebert commented 2 years ago

@maickrau this one hit me again with commit fd70355c335ea9e99e364b189a9420a8ca271a50 on a small (~single chromosome) graph and a correspondingly small ONT read set... I added this small data set to Globus for debugging

'--min-alignment-score 5000 --multimap-score-fraction 0.99 '
            '--precise-clipping 0.7 '
            '--seeds-mxm-length 30 --seeds-mem-count 10000 '
            '-b 15 --discard-cigar '
maickrau commented 2 years ago

Could you post the exact command and log file? I ran GraphAligner -g AFR-GWD-GB40-M_HG02666.AFMQ0YRAW.gfa -f AFR-GWD-GB40-M_HG02666_ONTUL.chrY-reads.mq00.fasta.gz -a alns.gaf --min-alignment-score 5000 --multimap-score-fraction 0.99 --precise-clipping 0.7 --seeds-mxm-length 30 --seeds-mem-count 10000 -b 15 --discard-cigar -t 4 with commit fd70355 and stopped it after about an hour when it didn't crash

ptrebert commented 2 years ago

hm... the call is correct, I just did not paste the read output file - maybe it's related to dumping the corrected reads via --corrected-reads (see log):

GraphAligner Branch MultiseedClusters commit fd70355c335ea9e99e364b189a9420a8ca271a50 2021-05-07 09:06:29 -0400
GraphAligner Branch MultiseedClusters commit fd70355c335ea9e99e364b189a9420a8ca271a50 2021-05-07 09:06:29 -0400
Load graph from output/clean_graphs/AFR-GWD-GB40-M_HG02666.AFMQ0YRAW.gfa
Build MUM/MEM seeder from the graph
Build alignment graph
MEM seeds, min length 30, max count 10000
Seed cluster size 1
Extend up to 5 seed clusters
Alignment bandwidth 15
Clip alignment ends with identity < 70%
X-drop DP score cutoff 16666
write alignments to output/hybrid/ont_to_graph/AFR-GWD-GB40-M_HG02666.ONTUL.AFMQ0YRAW.gaf
write corrected reads to output/hybrid/ont_to_graph/AFR-GWD-GB40-M_HG02666.ONTUL.AFMQ0YRAW.ONTHY.fasta
Align
terminate called after throwing an instance of 'std::length_error'
  what():  basic_string::_M_replace_aux

edit: and for the record, I am running this with 24 threads...

maickrau commented 2 years ago

I still didn't manage to reproduce this with GraphAligner -g AFR-GWD-GB40-M_HG02666.AFMQ0YRAW.gfa -f AFR-GWD-GB40-M_HG02666_ONTUL.chrY-reads.mq00.fasta.gz -a alns.gaf --min-alignment-score 5000 --multimap-score-fraction 0.99 --precise-clipping 0.7 --seeds-mxm-length 30 --seeds-mem-count 10000 -b 15 --discard-cigar -t 16 --corrected-out corrected.fa. Could you post the exact GraphAligner command you used with all parameters?

ptrebert commented 2 years ago

There is nothing more to the command that I could post here, it's part of a larger pipeline, so all parameters are hard-coded as stated above and identical to your call (and I/O as listed in the log). I made an isolated test run of the command with the test data, and it also did not fail for me this time. I now restarted the pipeline with all jobs (N=10) of that type. If none of them fails this time, maybe Singularity is sporadically causing this problem (although I do not remember if the run back in August was also inside a container, but I assume so...)

ptrebert commented 2 years ago

all the jobs failed, but not with this error, some jobs just died after ~2 hours. I think it's machine- or Singularity-related, and I first have to deal with our IT support before I can try again...

ptrebert commented 2 years ago

@maickrau I could identify the problem as an error with the mount/bind of the cluster file system in the Singularity container. Although clearly not a GA bug, I would strongly suggest a simple usability improvement by checking the existence of input files and raising explicit error messages in case they can't be read.