pmelsted / bifrost

Bifrost: Highly parallel construction and indexing of colored and compacted de Bruijn graphs
BSD 2-Clause "Simplified" License
203 stars 25 forks source link

Adding Many Sequences to Graph Crashes with Memory Error #86

Open tischulz1 opened 2 months ago

tischulz1 commented 2 months ago

Dear Bifrost Team,

I recently discovered a bug using the Bifrost API for one of my own programs. It led to a memory error and a crash of the program. The bug appears when iteratively calling the add function of a ColoredCDBG object.

I have attached a minimal code example and an input file (hard coded inside my code) to reproduce the issue. Unfortunately, the issue is not reproducible on all machines on which I tested.

Do you have a clue what is going on here? Is it really a bug in Bifrost or am I doing something wrong?

Any help is greatly appreciated.

Best, Tizian

tischulz1 commented 1 month ago

The bug does not yet seem to occur using Bifrost version 1.2.1.

GuillaumeHolley commented 1 month ago

Hi Tizian,

Sorry for the late answer, I've been on holidays when you first reached out and I haven't managed to find the time since I came back. I appreciate you reported this issue and provided a minimal example for this. To be accurate, this currently does not work (sometimes) for the latest version of Bifrost (currently 1.3.5) but seems to work with 1.2.1? Did you try to run some Valgrind on this minimal example to see if there were some incorrect memory access? Should be super fast to run given how short is the minimal code/input. You'll have to compile Bifrost in Debug mode for that. I am currently rather busy at work and I don't know when I'll be able to take care of this issue. Hopefully soon. There is also a branch named bifrost-fastNegativeQueries which is in theory the next version of Bifrost and which should work just fine. I haven't been able to work on it for some time but it is possible it solves some of the issues that were introduced in 1.3.5 if there were any.

Guillaume

tischulz1 commented 1 month ago

Hi Guillaume,

welcome back! I hope you had some nice holidays. :)

Yes correct, I have tested the minimal example on different Linux servers and some virtual machines in our cloud. It does not crash on all machines, but on those machines where it crashes once, it crashes always. I have also seen other strange things going on on machines where the minimal example runs through. Things like a graph written to disc using the API which cannot be read afterwards anymore or $k$-mers extracted from a graph which cannot be found in it if you query for them using CompactedDBG<Unitig_data_t, Graph_data_t>::find. It looks like Bifrost's internal data structures are getting compromised somehow.

I have checked out branch bifrost-fastNegativeQueries, but the issue seems to persist. The minimal code example crashes here as well.

I also used branch bifrost-fastNegativeQueries for a run of Valgrind. This is what the program says.

Tizian