pmelsted / bifrost

Bifrost: Highly parallel construction and indexing of colored and compacted de Bruijn graphs
BSD 2-Clause "Simplified" License
201 stars 25 forks source link

Invalid edge reported for self-complemental nodes #48

Open sebschmi opened 3 years ago

sebschmi commented 3 years ago

Consider the following input reference file:

a.fa

>a
CGCGG

Using the command line Bifrost build -k 4 -r a.fa -o a on the current master of this repo yields the following result:

a.gfa

H   VN:Z:1.0    BV:Z:1.0.5  KL:Z:4  ML:Z:2
S   1   CGCGG
L   1   -   1   +   3M

The reverse complement of S1 is CCGCG, which does not overlap in three characters with CGCGG. So the edge L1 must be wrong.

Note that the node CGCG is self-complemental.

GuillaumeHolley commented 3 years ago

Thanks for reporting this issue, I'll investigate asap. I wouldn't be surprise that that the issue is related to the very small k-mer size, I don't remember testing that edge case. Did you have the same issue with larger k-mers (k=5,6)?

sebschmi commented 3 years ago

Thank you for the quick answer.

Thank you for the quick response.

I get the same issue for k=30 with the following input reference: CCCCCCCCCCCCCCGCGGGGGGGGGGGGGGG (13 Cs attached to the front and 13 Gs attached to the back of the original string)

It is not possible to create the same test-case with uneven k, as then there would not be any self-complemental nodes.