pmelsted / bifrost

Bifrost: Highly parallel construction and indexing of colored and compacted de Bruijn graphs
BSD 2-Clause "Simplified" License
204 stars 25 forks source link

Houston, I have a problem #53

Closed LucasBarbosaRocha closed 3 years ago

LucasBarbosaRocha commented 3 years ago

My friends, I have a question.

A) I created a graph with just the k-mer TTT and I searched the k-mer TTT with the find function and the function returned that it found the k-mer in the graph.

B) Then I created a graph with only the AAA k-mer and I searched the TTT k-mer and I searched with the find function and the function returned that it found the k-mer in the graph (because of the reverse complement).

In case A and B, I used the referenceUnitigToString() function to know what is stored in the forward direction and in both cases I got AAA as an answer. But I wanted the TTT answer for case A and AAA for case B.

It is possible? because I want to know exactly what I added to the graph.

Best regards, Lucas B. Rocha

GuillaumeHolley commented 3 years ago

Hi @LucasBarbosaRocha,

The function referenceUnitigToString() returns the unitig in an arbitrary direction. Also, note that if you had created a graph using ATTTG and queried for TTT (case A in your example), using referenceUnitigToString() would return the whole unitig ATTTG and not just TTT.

I think what you are looking for is the function mappedSequenceToString() which: 1 - Takes into account the direction 2 - Returns only the queried sequence (not the whole unitig).

That being said, if you want to get the whole unitig associated with your queried subsequence but in the correct direction, you can do um.strand ? um.referenceUnitigToString() : reverse_complement(um.referenceUnitigToString()) assuming um is the UnitigMap object returned by find().

Hope this helps :)

Guillaume

LucasBarbosaRocha commented 3 years ago

Yeap..

It helps me, but it helps when I have one base. When I have the case below I'm confused and I still have a question:

Case A)

My Graph is TTTTT, and I found TTT and AAA and that was the answer:

Kmer TTT was found in the reverse-complement direction of unitig AAA ( But I want TTT (or TTTTT) in this case )

Kmer AAA was found in the forward direction of unitig AAA

Case B)

My Graph is AAAAA, and I found TTT and AAA and that was the answer:

Kmer TTT was found in the reverse-complement direction of unitig AAA

Kmer AAA was found in the forward direction of unitig AAA

But I would like exactly TTT (or TTTTT) and AAA (or AAAAA) for case A and B respectively. Not AAA for both cases.

Could you tell me what's going on?

Thanks, Lucas B. Rocha