waveygang / wfmash

base-accurate DNA sequence alignments using WFA and mashmap2
MIT License
172 stars 15 forks source link

queryLen <= querySize failed of wfmash #237

Closed baozg closed 3 months ago

baozg commented 3 months ago

I am trying to use pggb 0.6.0 to build a subgraph with ~50kb sequences. But wfmash cannot finish the alignment step. I checked the paf plot, it seems fine to me. What's the exact meaning of this error?

wfmash -s 5000 -l 25000 -p 80 -n 1 -k 19 -H 0.001 -Y # -t 48 --tmp-base tmp-p80k19n253s5000 tmp.subgraph.fa.gz --lower-triangular --hg-filter-ani-diff 30 --approx-map
314.18s user 1.20s system 3893% cpu 8.10s total 1114080Kb max memory
[mashmap] Skipping self mappings for single file all-vs-all mapping.
[wfmash::align] Reference = [tmp.subgraph.fa.gz]
[wfmash::align] Query = [tmp.subgraph.fa.gz]
[wfmash::align] Mapping file = tmp-p80k19n253s5000/wfmash-vlanRl
[wfmash::align] Alignment identity cutoff = 64.00%
[wfmash::align] Alignment output file = /dev/stdout
[wfmash::align] time spent loading the reference index: 0.01 sec
[wfmash::align::computeAlignments] aligned  0.00% @ 0.00e+00 bp/s elapsed: 00:00:00:00 remain: 00:00:00:00wfmash: /wfmash/src/align/include/computeAlignments.hpp:510: void align::Aligner::doAlignment(std::stringstream&, align::MappingBoundaryRow&, const string&, const std::shared_ptr<std::__cxx11::basic_string<char> >&, uint64_t): Assertion `queryLen <= querySize' failed.
Command terminated by signal 6
wfmash -s 5000 -l 25000 -p 80 -n 1 -k 19 -H 0.001 -Y # -t 48 --tmp-base tmp-p80k19n253s5000 tmp.subgraph.fa.gz --lower-triangular --hg-filter-ani-diff 30 -i tmp-p80k19n253s5000/tmp.subgraph.fa.gz.304cbb4.mappings.wfmash.paf --invert-filtering
0.28s user 0.01s system 100% cpu 0.29s total 24064Kb max memory

tmp subgraph fa gz 304cbb4 mappings wfmash paf

AndreaGuarracino commented 3 months ago

Can you confirm that it was a problem with the input FASTA file?

samtools faidx tmp.subgraph.fa 
[W::fai_insert_index] Ignoring duplicate sequence "Col-CC#1#Chr3:4850434-4913275" at byte offset 15709628
baozg commented 3 months ago

After remove the duplicate sequence, wfmash could finish running.