Open jeromekelleher opened 2 years ago
# Get the deepest generation for each internal node
maximum_genealogical_depth <- function(pedigree, list_of_probands) {
# probands are individuals at generation 0
individuals <- data.frame(ind = list_of_probands, generation = 0)
# initialize output to append to downstream
gen_depth <- individuals
# for loop to iterate through generations
for(k in 1:18) {
individuals_parents <-
# start from the entire pedigree
pedigree %>%
# keep parents of individuals at generation 'k'
filter(ind %in% individuals$ind) %>%
# if parent is missing, flag individual as founder
mutate(founder = ifelse(is.na(father) | is.na(mother),T, F))
# stop climbing when there are only founders left
if(nrow(filter(individuals_parents, founder == F)) == 0){break}
# to ascend genealogy, parents become reference individuals
mothers <- unique(individuals_parents$mother)
fathers <- unique(individuals_parents$father)
# recursion step reassigns reference individuals
individuals <-
data.frame(ind = c(mothers, fathers), generation = k) %>%
filter(!is.na(ind)) # removes missing IDs
# output max depth per individual
gen_depth <-
# individuals already visited by climbing algorithm
gen_depth %>%
# remove duplicate individuals at 'k-1'
filter(!ind %in% individuals$ind) %>%
# append updated individuals 'k' generations deep
bind_rows(individuals)
}
return(gen_depth)
}
I'll note here that I knew in advance that the French-Canadian genealogy was less than 18 generations hence the for loop, but this could be done with a while loop for a more general solution
It would be very useful to be able to support pedigrees that have either no time associated, or partial time values. We avoided doing this in the initial implementation as it's not totally obvious how to do this well in general, and so it's better to leave the functionality out rather than have to make breaking changes later. There's also the problem of rigorously testing it, which takes time to do well.
The prototype pedigree interface had an implementation of this here which looked like this:
@LukeAndersonTrocme you said you have an R script for doing something similar in https://github.com/tskit-dev/msprime/issues/1915#issuecomment-968361068, would you mind posting here please?