Closed AustinHartman closed 3 years ago
Hi,
it is not an edge case, but the intended implementation. Pile::FindValidRegion
searches for the longest region with coverage above 4.
Best regards, Robert
Thanks for clarifying the intended implementation. Under the current implementation I don't think Pile::FindValidRegion
finds the longest region with coverage above 4 when the longest region includes the last value in data_. Is this a bug?
In the example above, the reason why end!=9 is because of the value 3 at position 4. The first and last value in data_ should always equal to 0 (I am +-1 begin/end of each overlap at https://github.com/lbcb-sci/raven/blob/master/src/pile.cpp#L42-L43).
I see - thanks for pointing me to those lines. Still, I don't understand how the implementation would find the longest region in an example such as data=[0, 5, 0, 5, 5, 5, 5] with coverage=4. I would expected (begin, end) to be (3, 7) (or (3, 6) if the final position should always equal 0). It seems the current implementation would return (1, 2) since the region must be looped over completely before updating begin and end.
data_
will always look like [0, x, x, x, ..., x, x, 0]
. Your example would be [0, 5, 0, 5, 5, 5, 5, 0], and begin,end would equal to (3,7). In the case where the last value would be bigger than 4, the implementation would not properly find the last end. But as I stated, the array will always begin with 0 and end with 0.
Ah I see now, thank you for taking the time to explain
No worries :)
Hi,
I noticed the function FindValidRegions seems to not pick up the longest valid region if the region continues up to the final position in data. For example, if data = [0, 5, 6, 5, 3, 5, 6, 6, 5] and coverage = 4, the current function would set begin=1 and end=4 whereas this proposed change would set begin=5 and end=9.
I ran the same E. coli data on a baseline build and a build using the modification here, and didn't observe any changes in the final assembly, but perhaps there are datasets where this edge case would apply.
Let me know what you think.