Closed jeromekelleher closed 3 years ago
This also sounds good to me, though I'll probably have a few more questions about the semantics too. I'm not totally clear on how you could have store_segments=True
with store_pairs=False
-- don't you need to know the pair to lookup the segments?
Yes, you're right, store_segments implies store_pairs.
Ah sorry, I think I meant this the other way around -- but i can actually see some applications when just the pairs and not the segments might be wanted.
If store_segments
is False and store_pairs
is True, then we keep a segment_list for each pair (we find an IBD segment for), but we only update the running totals of num_segments
and total_span
instead of actually storing the segments themselves. This would save a lot of memory.
Sounds good to me! Since it'll be qutie easy to get a very big object, I'd suggest that store_segments
default to False
, so that people can first run the method to see how many segments they'd get at a given threshold and then turn on storing segments to actually get them.
Great, that's exactly what I was thinking.
Can we guesstimate (or give a sensible lower bond) for the size the object based on the number of pairs and segments? That might be more useful than just the number of segments since I imagine that's the constraint that users will be thinking about.
That seems like a very tricky thing to do in general...
That's easy I think - we know how many segments and pairs there are and we know how much memory each one takes. We could do this in Python as part of the __str__
implementation. Probably best do it as a follow up though.
See #1708 for more background on the
ibd_segments
method (currentlyfind_ibd
)The idea here is that the IbdSegments gives us access to useful simple summaries of the IBD segments, which may be all we need for some purposes. Then, storing the segments (or indeed, even the pairs) would be unnecessary and we'd save a lot of time an memory by not storing them.
If store_pairs is false, store_segments is also necessarily false. We'd have to be careful about the semantics of the IdbSegments class in these cases, but I think the simplest and most logical thing to do would be to raise an error if someone tries to access information that hasn't been stored.
Then, we could argue about what the defaults should be.
Any thoughts @gtsambos @petrelharp ?