pmelsted / bifrost

Bifrost: Highly parallel construction and indexing of colored and compacted de Bruijn graphs
BSD 2-Clause "Simplified" License
201 stars 25 forks source link

writing graph index hangs in 1.3.0 upward #80

Closed vpbrendel closed 5 months ago

vpbrendel commented 5 months ago

Great program. Thanks. I was testing a different program (https://github.com/tischulz1/plast) that makes use of bifrost and ended up with an odd error. Could you please take a look?

Here is my odd sequence:

oddseq GAATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATTG

Here is how I use it: mkdir fa cd fa wget 'ftp://hgdownload.cse.ucsc.edu/goldenPath/eboVir3/bigZips/160sequences.tar.gz' tar xzf .gz \rm .gz cd ..

ls fa | sed -e "s#^#fa/#" > slst

../bifrost-1.2.1/build/src/Bifrost build -o mygfa-1.2.1 -r slst -t 10 -n -v >& err-1.2.1 ../bifrost-1.3.0/build/src/Bifrost build -o mygfa-1.3.0 -r slst -t 10 -n -v >& err-1.3.0

cat slst | sed -e '$a gnm.fa' > slstF ../bifrost-1.2.1/build/src/Bifrost build -o mygfa-1.2.1F -r slstF -t 10 -n -v >& err-1.2.1F ../bifrost-1.3.0/build/src/Bifrost build -o mygfa-1.3.0F -r slstF -t 10 -n -v >& err-1.3.0F

Outcome: The plast example (ebola sequences) works fine with 1.2.1 and 1.3.0. Then I add my odd sequence to the fa directory. 1.2.1 is still happy. But 1.3.0 hangs on writing the index.

Multiple testing of similar cases on Fedora 39 Linux. I think this happens in IO.tcc, but you will know better. Thanks, Volker

GuillaumeHolley commented 5 months ago

Hi @vpbrendel,

Thank you for reporting this issue. I also noticed this recently and already made a fix for it. The issue happens exclusively when working on small input data resulting in a very low number of minimizers and/or k-mers in the graph. This currently causes a rounding issue in the MinimizerIndex and KmerHashTable when trying to reallocate these tables to larger sizes which in turns causes the program to hang as it indefinitely tries and fails to reallocate the table to a larger size. This concerns only v1.3.0 and v1.3.1 because these versions features the new robin-hood hashing I implemented which features the bug.

Anyway, I have integrated the fix in the new version I have been working on which I am gonna release asap. I'll let you know as soon as the release is there.

Guillaume

GuillaumeHolley commented 5 months ago

The branch bifrost-fileAsQuery should have the fixes needed if you wanna try it now. I need to update the README, the Changelog and I still have to do a few more tests so the official release won't come before tomorrow.

GuillaumeHolley commented 5 months ago

README for that branch should be fairly up to date now.

GuillaumeHolley commented 5 months ago

v1.3.5 has been released, I am confident it solves your issue. I am closing this issue but feel free to reopen if you encounter additional problems.

Guillaume

vpbrendel commented 5 months ago

Thanks Guillaume. I confirm that the current version works as intended. Volker