ivan-krukov / aligning-genealogies

The genealogy-coalescent alignment project
3 stars 0 forks source link

Re-write `get_probands_under` method #22

Closed shz9 closed 4 years ago

shz9 commented 4 years ago

After discussion with @LukeAndersonTrocme, I found that the get_probands_under method of the Genealogical class is inefficient for large genealogies, as it repeats lots of redundant computations. There's a possibility to re-write it so that it runs in O(n) time.

Basic idea: Start from the probands, and progressively pass the information up the pedigree. Base case: n is a proband --> Set their probands_under value to set(n) General case: n is not a proband --> Set their probands_under value to set.union(*[child.probands_under for child in self.successors(n)])

Plus: Keep track of the number of paths between a given proband and the ancestor.

shz9 commented 4 years ago

Idea has been implemented in get_num_paths_to_target method. I opted to keeping the get_probands_under() method as a separate functionality for now because I don't want to break Aligner objects. Maybe we can unify both methods in the future.

When you call get_num_paths_to_target on a Genealogical object, it returns a dictionary with the nodes of the pedigree as keys and the value is a dictionary of the targets and the number of paths connecting the node to the target. Example is available in the Jupyter notebook. Figure below shows a simple example where you specify which target you want. If target isn't specified, the method uses the probands by default.

@LukeAndersonTrocme I tested this method on a large simulated genealogy with >40,000 individuals. It took about a minute to run. Let me know if you'd like to use it for your BALSAC study.

I'll close this issue for now.

Screen Shot 2020-06-26 at 5 25 41 PM