pmelsted / bifrost

Bifrost: Highly parallel construction and indexing of colored and compacted de Bruijn graphs
BSD 2-Clause "Simplified" License
201 stars 25 forks source link

overflow gfa link output at k128 #41

Closed kevyin closed 2 years ago

kevyin commented 3 years ago

Hi Thank you for bifrost

Was wondering if you could help with this strange error

cmake .. -DENABLE_AVX2=OFF -DCOMPILATION_ARCH=OFF -DMAX_KMER_SIZE=128 && make -j 8 && sudo make install && sudo ldconfig

When I run on a dataset, some vertex numbers seem to have overflowed in the link output, 94537891117328 changes between runs, and is not occur in any "S"egment of the gfa

Bifrost build --colors -s chunk_2_1793.fq.gz -o chunk_2_1793.fq.gz  &&  cat chunk_2_1793.fq.gz.gfa | grep ^L

L   1   -   16  -   30M
L   1   -   18  +   30M
L   1   +   6   +   30M
L   1   +   5   +   30M
L   2   -   17  -   30M
L   2   +   10  +   30M
L   3   +   15  +   30M
L   3   +   18  -   30M
L   4   -   12  -   30M
L   4   +   5   -   30M
L   4   +   6   -   30M
L   5   -   1   -   30M
L   5   +   4   -   30M
L   6   -   1   -   30M
L   6   +   4   -   30M
L   7   -   9   -   30M
L   7   -   14  -   30M
L   8   -   17  -   30M
L   8   +   10  +   30M
L   9   -   12  +   30M
L   9   +   7   +   30M
L   10  -   2   -   30M
L   10  -   8   -   30M
L   12  -   14  +   30M
L   12  -   9   +   30M
L   12  +   4   +   30M
L   12  +   13  +   30M
L   13  -   12  -   30M
L   14  -   12  +   30M
L   14  +   7   +   30M
L   15  -   3   -   30M
L   16  -   94537891117328  -   30M
L   16  +   1   +   30M
L   17  +   2   +   30M
L   17  +   8   +   30M
L   18  -   1   +   30M
L   18  +   3   -   30M
L   20  -   94537891117328  -   30M
L   20  +   94537891117328  +   30M
L   20  +   16  +   30M
L   19  +   94537891117328  +   30M
L   19  +   16  +   30M

I get more normal looking output with other values of K. I've tried 32, 96, 192 (It could be happening with other datasets though)

L   1   -   16  -   30M
L   1   -   18  +   30M
L   1   +   6   +   30M
L   1   +   5   +   30M
L   2   -   17  -   30M
L   2   +   10  +   30M
L   3   +   15  +   30M
L   3   +   18  -   30M
L   4   -   12  -   30M
L   4   +   5   -   30M
L   4   +   6   -   30M
L   5   -   1   -   30M
L   5   +   4   -   30M
L   6   -   1   -   30M
L   6   +   4   -   30M
L   7   -   9   -   30M
L   7   -   14  -   30M
L   8   -   17  -   30M
L   8   +   10  +   30M
L   9   -   12  +   30M
L   9   +   7   +   30M
L   10  -   8   -   30M
L   10  -   2   -   30M
L   12  -   14  +   30M
L   12  -   9   +   30M
L   12  +   4   +   30M
L   12  +   13  +   30M
L   13  -   12  -   30M
L   14  -   12  +   30M
L   14  +   7   +   30M
L   15  -   3   -   30M
L   16  -   19  -   30M
L   16  +   1   +   30M
L   17  +   8   +   30M
L   17  +   2   +   30M
L   18  -   1   +   30M
L   18  +   3   -   30M
L   19  -   19  -   30M
L   19  +   19  +   30M
L   19  +   16  +   30M
L   20  +   19  +   30M
L   20  +   16  +   30M
kevyin commented 3 years ago

chunk_2_1793.fq.gz this is the dataset in case it's helpful

GuillaumeHolley commented 2 years ago

Hi @kevyin,

Bifrost has changed quite a bit since you made us aware of this issue and I haven't been able to reproduce this issue in our latest version. I am closing this for now but feel free to reopen if you encounter the same problem again.

Thank you, Guillaume