waveygang / wfmash

base-accurate DNA sequence alignments using WFA and mashmap3
MIT License
175 stars 18 forks source link

v0.17.0 never complete #262

Open jiadong324 opened 2 months ago

jiadong324 commented 2 months ago

Hi,

I am using both v0.13.0 and v0.17.0 to align a same set of query and reference sequences with same paramers. The size of ref.fa is about 12G and query.fa is 46G.

wfmash -s 50k -l 150k -p 90 -n 1 -H 0.001 -t 30 ref.fa query.fa > wfmash.paf

The v0.13.0 finished in about 1.5 hours. The *.paf of v0.17.0 is approximately the same size as the output generated by v0.13.0. It seems that v0.17.0 stop to write but the program is still running. It stuck at the 99.94% completeness of sequence alignment.

[mashmap] MashMap v3.1.1
[mashmap] Reference = [../g2_fa/g2_asm.fa]
[mashmap] Query = [../g3_fa/g3_asm.fa]
[mashmap] Kmer size = 19
[mashmap] Sketch size = 2998
[mashmap] Segment length = 50000 (read split allowed)
[mashmap] Block length min = 150000
[mashmap] Chaining gap max = 30000
[mashmap] Mappings per segment = 1
[mashmap] Percentage identity threshold = 90%
[mashmap] Do not skip self mappings
[mashmap] Hypergeometric filter w/ delta = 0 and confidence 0.999
[mashmap] Mapping output file = /net/eichler/vol28/projects/medical_reference/nobackups/human_SDR/DP_Platinum/wfmash_g2g3/wfmash-9VW7x7
[mashmap] Filter mode = 1 (1 = map, 2 = one-to-one, 3 = none)
[mashmap] Execution threads  = 30
[mashmap::skch::Sketch::build] Unique minmer hashes before pruning = 138308619
[mashmap::skch::Sketch::build] Total minmer windows before pruning = 1390109507
[mashmap::skch::Sketch::computeFreqHist] Frequency histogram of minmer interval points = (2, 4297961) ... (3863160, 1)
[mashmap::skch::Sketch::computeFreqHist] With threshold 0.001%, ignore minmers with more than >= 30412 interval points during mapping.
[mashmap::skch::Sketch] Unique minmer hashes after pruning = 138307236
[mashmap::skch::Sketch] Total minmer windows after pruning = 1307505552
[wfmash::map] time spent computing the reference index: 427.234 sec
[mashmap::skch::Map::mapQuery] mapped  -nan% @ 0.00e+00 bp/s elapsed: 00:01:38:23 remain: 00:00:00:00
[mashmap::skch::Map::mapQuery] count of mapped reads = 1847, reads qualified for mapping = 2132, total input reads = 2132, total input bp = 0
[wfmash::map] time spent mapping the query: 5.91e+03 sec
[wfmash::map] mapping results saved in: /net/eichler/vol28/projects/medical_reference/nobackups/human_SDR/DP_Platinum/wfmash_g2g3/wfmash-9VW7x7
[wfmash::align] Reference = [../g2_fa/g2_asm.fa]
[wfmash::align] Query = [../g3_fa/g3_asm.fa]
[wfmash::align] Mapping file = /net/eichler/vol28/projects/medical_reference/nobackups/human_SDR/DP_Platinum/wfmash_g2g3/wfmash-9VW7x7
[wfmash::align] Alignment identity cutoff = 0.00%
[wfmash::align] Alignment output file = /dev/stdout
[wfmash::align] time spent loading the reference index: 0.08 sec
[wfmash::align::computeAlignments] aligned 99.94% @ 2.58e+05 bp/s elapsed: 01:20:46:26 remain: 00:00:01:36
miliasot commented 6 days ago

Hello, I get some of my jobs stuck with v0.21.0 too.

ekg commented 5 days ago

Update coming which will tend to reduce runtime in alignment.

Il mer 16 ott 2024, 01:23 Sotiria Milia @.***> ha scritto:

Hello, I get some of my jobs stuck with v0.21.0 too.

— Reply to this email directly, view it on GitHub https://github.com/waveygang/wfmash/issues/262#issuecomment-2415833149, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEK5Y57GVA75YZLJTVTZ3YA5PAVCNFSM6AAAAABMBCUHJGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJVHAZTGMJUHE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ASLeonard commented 4 days ago

The issue is probably a bit deeper than that. I can provide a small-ish example that always triggers the freeze.

The first thing to note was changing --erode-match-mismatch allowed wfmash to finish. It froze in the default setting (the erode_k=127), froze for erode_k down to around 45, but would finish if erode_k < 40.

After chasing lambdas, it is getting stuck in an infinite loop in do_progressive_wfa_patch_alignment(), where the updated alignment seems to be identical to the original alignment, and so never triggers the break statements. For simplicity, this is just two iterations.

WFA fwd alignment: Alignment: Query(2663892-2665099/1207) Target(2685261-2686470/1209) Score=2147483647 Rev=No Status=NotOK Keep=No CIGAR= Indices(i,j)=(2685261,2663892)
WFA rev alignment: Alignment: Query(0-0/0) Target(0-0/0) Score=2147483647 Rev=Yes Status=NotOK Keep=No CIGAR= Indices(i,j)=(0,0)
bounds: 1207 1219 1209 1221
left_query_size: 1207 left_target_size: 1209
right_query_size: 0 right_target_size: 0
max_left_size: 1209 max_right_size: 0
WFA fwd alignment: Alignment: Query(2663892-2665099/1207) Target(2685261-2686470/1209) Score=2147483647 Rev=No Status=NotOK Keep=No CIGAR= Indices(i,j)=(2685261,2663892)
WFA rev alignment: Alignment: Query(0-0/0) Target(0-0/0) Score=2147483647 Rev=Yes Status=NotOK Keep=No CIGAR= Indices(i,j)=(0,0)
bounds: 1207 1219 1209 1221
left_query_size: 1207 left_target_size: 1209
right_query_size: 0 right_target_size: 0
max_left_size: 1209 max_right_size: 0

Since both alignments are not okay and the scores are identical, it never triggers either branch https://github.com/waveygang/wfmash/blob/1460a988cd42b712fd276698b4603c79e41faa78/src/common/wflign/src/wflign_patch.cpp#L617-L624

Nothing seems to get updated later on, and so the while loop just continues through forever. I'm not sure if the alignment score is coincidentally the INT32 max, but when using a small erode_k that did finish, all the scores looked normal.

The actual sequences being processed in do_wfa_patch_alignment() are

query NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGTT 
target CCTTCTTTATGGTCAATTTCTCACATGGACAGGGAGGCCTCGCGTGCATGGGATCGCAAAGAGTCGGACACGACTGAGCGACTGATCTGATCTCACATGGATACATGATTACTGGAAAAAAACGCAGCTTTAACTATAAGGACCTTTGTCAGCAAAGTGATATGTCTGCTTTTTAACACATTGTGTCTAGGTTTGTCATAGCTCCTCTTCCAAGGAGCAAGTGTCTTTTAATTTCATGGCTGCAGTCACCATCCACAGTGCTTTTGAAGCCCAAGAAAATAAAATTTGTCAGTTTCCACTTTTTCCCCATCTATTTACCATGAAGTGATGGTACCGGATGCCATGATCTCAGTTTTTTTGTGAATGTTGAGTTTAAAGCCAGGTTTTTCACTCTCCTGTTGCACTCTCATTAGTGCCTCTTTAGTTCCTCTTCACTTTCTGGTATTAGAGTGGTGTCAGCCTGCATAGCCATGAAGTGAAAGTTGTTCAGTCATGTCCAACTCTTTGTGACACCATGAACTATACAGTCCATGGAATTTTCCAGGCCAGAATACTACAGTGGATAGCCTTTCCCTTCTCAAGGGAATCTTCCCAACCTAGCGATCGAACCCAGGTCTCCCATATTGCAGGTAGATCCTTTACCAGCTGAGCTGCAAGGGCAGCATATCTGTGGTTGTCCATATTTCTCCTGGCAATTTTAATTCCAGCTTGTGATTTATCTGGACATTTCACATGACTTATTCTGCATATAAGTTAAATAAGCAGGGTGACAATATACAGCTTTCATGTACTCCTTTTCGAAATTGAATCAGTCCATTGTTTCTTGTAAGGTTCTAACTGTTGCTTCTTGACCTGCACACAGGTTTTTTCAGGAGACAGGTAAAGTATTCTGGTATTCCCATCTTTTTTTAAATTCCAAGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGAGAGAGAGAGAGAGAGAGAGATCCACACAGTCAAAGGATTTAGTGCAGTCATTGAAACAGAAATAGATTTTTTTTTTTTTTTTTTGGAATTCCCTTGATTTTTCTATGATCCAGTGGATGTTGGCAATTTGATCTCTGGTTTCTCTGCCTTTTCTAAATCCAGCTTGTACATCTGGCAATTCCTGGTTCACATAATGCTGAAGCCTAGCTTCAAGGATTTTGAGCATAATCTTAACAGCATCTGAAATGAGTGCAATTG

Interestingly, the reported forward score (aln.score) here is -2147483648, so maybe there is also some underflow error since the score is reported as 2147483647 later on.

I guess this is just some painful edge case where a patch has just enough non-N sequence to progress but too much N to give a real score?

ASLeonard commented 4 days ago

After some more digging, the very first iteration for that patch looks different. The score seems to have underflowed for the forward alignment, but the reverse looks okay. But because the rev score is below the MAX_INT32 forward score, it gets in the bad infinite cycle.

score is -2147483648
WFA fwd alignment: Alignment: Query(2663892-2665111/1219) Target(2685261-2686482/1221) Score=2147483647 Rev=No Status=NotOK Keep=No CIGAR= Indices(i,j)=(2685261,2663892)
WFA rev alignment: Alignment: Query(2663892-2665111/1219) Target(2685261-2686482/1221) Score=2480 Rev=Yes Status=OK Keep=No CIGAR=MXMMDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDMMMMMXMMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII Indices(i,j)=(2685261,2663892)
ASLeonard commented 3 days ago

It seems like there was a check against the forward score being -2147483648, but that was removed in b663abe during #255.