Closed jeromekelleher closed 3 years ago
@gtsambos - I think this is what's hitting us over in #1809. Any quick ideas on what might be the issue here?
Hmm. There was one Python IBD test that I was marking as an expected failure bc I couldn't get to the bottom of it at the time, and I think it was this one, but I've just seen that it seems to have been updated with all your recent changes. Let me go back and see if I can find it in the history.
Note that the corresponding Python test doesn't actually test for this set of pairs (it only looks for parent-child pairs), so this may well fail in both implementations.
My guess is there's a bug to do with when the parent_should_be_added
boolean is invoked and modified. It's perhaps a little easier to see what it does in the Python code, here in lines 219-237
This was always perhaps not the best way to implement this edge case -- basically, what it does is add an extra segment to the list of ancestral segments if the parent is also a sample, which allows the calculate_ibd_segs
function to process it as usual. Basically, calculate_ibd_segs
assumes that the MRCA will be a polytomous node, so by default it won't do anything if there is just one edge underneath (as there is in this example). By adding an extra segment in, we 'trick' it into thinking there are two.
It might be simpler to just call add_segment
directly in this loop, because this would avoid some of the extra bookkeeping needed to stop lots of duplicate segments being added into the lists (which is why parent_should_be_added
is needed in the first place)
I'll be working tonight if you want to have more of a chat about it then @jeromekelleher! I need to do some thesis work now, but can explain in more detail exactly what parts of the code might need to be changed, and to what
Ah, sounds about right, thanks @gtsambos. I'll see if I can get my head around it. FWIW, internal samples has always been a major pain in these simplify-esque algorithms.
The C test test_ibd_finder_samples_are_descendants is interesting, as it checks for IBD relationships in the following case:
Now, we if we specify the set
within = {0, 1, 2, 3, 4, 5}
we get:So, it's only finding in the relationships between samples with parent/child relationships and not also the ones between, e.g., 0 and 4.
I'm not sure this test ever worked btw - some of the loops in the C tests were preventing code from being run in a non-obvious way.
Any thoughts here @gtsambos? There's no point in looking at this in detail until #1660 is merged, but good to get any immediate inputs.