Closed lrvdijk closed 4 years ago
Hey Lucas,
Thank you for your message. The modification is necessary and your test case failing is actually normal.
In your test case, when you use find()
, the UnitigMap
object returned is the mapping of a single k-mer, your query, on a unitig of the graph. If you call UnitigMap::mappedSequenceToString()
on that object, it is going to print a single k-mer (conditioned to UnitigMap::isEmpty == false
) because UnitigMap::len
is 1. Now, when you use getSuccessors()
, the UnitigMap
object returned is always a mapping of the entire successor's unitig on... itself. It makes sense because you want to get the entire sequence of the successor and not the first or last k-mer of that unitig. Hence, if you call UnitigMap::mappedSequenceToString()
on it, it is going to print the complete unitig of the successor in the right "direction" (forward or reverse-complement) because the graph is bi-directed. Your test case is failing because you are comparing the mapping of a single k-mer to the mapping of an entire unitig. Yet, if you compare both UnitigMap::referenceUnitigToString()
, you will see that both mappings are to the same unitig.
The code located here https://github.com/pmelsted/bifrost/blob/master/src/NeighborIterator.tcc#L37 shows that internally, when we query for the successors or predecessors of a unitig, we actually query for k-mers in the graph (each unitig has only 4 possible successors and predecessors). And because the UnitigMap
objects returned are mapping of a single k-mer, we must modify them such that the mapping matches the entire unitig.
I hope this is a little bit more clear but let me know if it is not and I will try to explain this a little bit better ;)
Guillaume
Aha, thanks, seems my understanding was incomplete! Thanks for the explanation, this clears a lot up!
No problem ;) I'm closing this issue then.
In NeighborIterator.tcc, here: https://github.com/pmelsted/bifrost/blob/master/src/NeighborIterator.tcc#L37 the
UnitigMap
object is modified at each iteration (if it exists in the graph). It's unclear to me, however, why this modification is done? You'd expect the data fromfind
should be correct right?The reason I'm asking is because the following test case now fails: https://github.com/broadinstitute/pyfrost/blob/master/tests/test_mccortex.cpp#L63
If you iterate over the successors of a unitig, and compare the UnitigMap object obtained through the iterator to a UnitigMap object from a known successor of the same unitig obtained earlier using
find
, all==
comparisons will return false because thelen
attribute is different.If I comment out line 37 and 38 in
NeighborIterator.tcc
my test case succeeds.Is there a reason why the
len
anddist
attributes get modified?Thanks!