Closed shz9 closed 4 years ago
Idea has been implemented in get_num_paths_to_target
method.
I opted to keeping the get_probands_under()
method as a separate functionality for now because I don't want to break Aligner objects. Maybe we can unify both methods in the future.
When you call get_num_paths_to_target
on a Genealogical object, it returns a dictionary with the nodes of the pedigree as keys and the value is a dictionary of the targets and the number of paths connecting the node to the target. Example is available in the Jupyter notebook. Figure below shows a simple example where you specify which target you want. If target isn't specified, the method uses the probands by default.
@LukeAndersonTrocme I tested this method on a large simulated genealogy with >40,000 individuals. It took about a minute to run. Let me know if you'd like to use it for your BALSAC study.
I'll close this issue for now.
After discussion with @LukeAndersonTrocme, I found that the
get_probands_under
method of the Genealogical class is inefficient for large genealogies, as it repeats lots of redundant computations. There's a possibility to re-write it so that it runs inO(n)
time.Basic idea: Start from the probands, and progressively pass the information up the pedigree. Base case:
n
is a proband --> Set theirprobands_under
value toset(n)
General case:n
is not a proband --> Set theirprobands_under
value toset.union(*[child.probands_under for child in self.successors(n)])
Plus: Keep track of the number of paths between a given proband and the ancestor.