maickrau / GraphAligner

MIT License
256 stars 30 forks source link

assembly to graph alignment: assertion fail #33

Closed ptrebert closed 3 years ago

ptrebert commented 3 years ago

Hi Mikko, with a current version of GraphAligner (master commit 48143daff7b771a7e15ef8e500ccb10011155940 2021-02-19 11:23:24), I find the following failed assertion in the log during an assembly to graph alignment (HPRC / CHM13):

src/GraphAlignerBitvectorBanded.h:581: Assertion '!params.preciseClipping || (seedHits.size() != 0 && seedhitEnd == seedhitStart) || sliceResult.maxExactEndposScore >= -((ScoreType)slice.j + WordConfiguration<Word>::WordSize) * params.XscoreErrorCost' failed. Read: HG002#2#h2tg000009l. Seed: 144346+,33093741,15,1065

The precise clipping parameter is set to 0.98, and, so far, I did not encounter any other problem (e.g., as described in #24). As far as I can tell, GraphAligner is still running with all threads at ~100%, or could be stuck in some undetermined state. How bad is the above? Thanks for your help!

+Peter

maickrau commented 3 years ago

This means that HG002#2#h2tg000009 was not aligned and will not appear in the output but it will still try to align the other remaining sequences. Can you send me the graph and the HG002#2#h2tg000009 sequence?

ptrebert commented 3 years ago

The graph is the minigraph CHM13 freeze 1 ftp://ftp.dfci.harvard.edu/pub/hli/minigraph/HPP/CHM13-freeze1.gfa.gz

Regarding the tig sequence, it's a bit too large to be sent via email directly, so I'll send you a download link via mail.

maickrau commented 3 years ago

Fixed in f1b574b. This was caused by a combination of high --precise-clipping and very long sequences leading to an integer underflow.

ptrebert commented 3 years ago

Thanks for fixing. I don't have the resources to rerun and test the fix right now, and this is mostly due to the fact that the original alignment run hasn't finished yet (~2700 CPU hours and counting), so feel free to close.

ptrebert commented 3 years ago

@maickrau so this would be a possible test case for whole-genome to graph alignment that did not finish within several weeks. I think this was the original command line: GraphAligner -g CHM13-freeze1.gfa -f HG002.maternal.f1_assembly_v1.fa.gz -a HG002.maternal.f1_assembly_v1.MAP-TO.CHM13-freeze1.gaf -t 4 -x vg --precise-clipping 0.98 --X-drop 10000

subwaystation commented 3 years ago

@ptrebert Maybe try with massively more threads? The 3 samples I am aligning to our pggb graph in dbg mode took ~5 days using 28 threads. I would throw as many as you have.

ptrebert commented 3 years ago

yeah, back then, I was told too many threads would too easily exhaust the memory... for me, this test case is no longer relevant. Nevertheless, in absolute numbers, Mikko needs to judge whether or not 4800+ CPU hours is reasonable for this alignment task or deserves some optimization ;-)