ivan-krukov / aligning-genealogies

The genealogy-coalescent alignment project
3 stars 0 forks source link

Add harmonization step to Aligner objects #16

Open shz9 opened 4 years ago

shz9 commented 4 years ago

Aligner objects as currently implemented only map nodes in the tree sequence to nodes in the pedigree. This can be done either iteratively or in one step. One thing that's missing is "harmonization" or "sanity checking" for greedy algorithms. The goal of the harmonization step is to make sure that the mappings make sense and are consistent with other information that we have, e.g.:

After these sanity checks are implemented and used to correct the mapping (if possible), then one final thing that can be added is the pedigree node to tree-sequence edge mapping:

shz9 commented 4 years ago

Other cases to check in the harmonization step:

shz9 commented 4 years ago

I added basic code for the harmonization step. Implemented two checks:

[1] If a node in the tree sequence n_ts is mapped to a founder node in the pedigree n_ped, then set all of its ancestors to be out-of-pedigree nodes (i.e. their ancestors map to None). [2] If 2 sibling nodes in the tree sequence are mapped to disconnected nodes in the pedigree (i.e. nodes with no common ancestor in the pedigree), then set all of their ancestors to be out-of-pedigree nodes.

These 2 checks improved the accuracy of the greedy algorithms by 5-12 points. Now the accuracy for the best configuration is hovering around 90%.

@kobrica will take it from here and implement the rest of the checks.

andjelatodorovic commented 4 years ago

I have implemented all of the checks, but they need to be tested along with @shz9 functions

shz9 commented 4 years ago

@kobrica Sorry, I haven't had a lot of time to look at your implementation in detail. But it seems there are a couple of syntax errors:

UnboundLocalError: local variable 'ts_n_succ' referenced before assignment

Can you please test your code in a jupyter notebook for now? You can use the template in Experiments.ipynb.

andjelatodorovic commented 4 years ago

@shz9 I will try to test everything out in jupyter notebook, I think I know what the error is here.

shz9 commented 4 years ago

I re-arranged the code a bit. It doesn't work quite well for the diploid case. We can discuss the remaining steps soon.