Open shz9 opened 4 years ago
Other cases to check in the harmonization step:
I added basic code for the harmonization step. Implemented two checks:
[1] If a node in the tree sequence n_ts
is mapped to a founder node in the pedigree n_ped
, then set all of its ancestors to be out-of-pedigree nodes (i.e. their ancestors map to None
).
[2] If 2 sibling nodes in the tree sequence are mapped to disconnected nodes in the pedigree (i.e. nodes with no common ancestor in the pedigree), then set all of their ancestors to be out-of-pedigree nodes.
These 2 checks improved the accuracy of the greedy algorithms by 5-12 points. Now the accuracy for the best configuration is hovering around 90%.
@kobrica will take it from here and implement the rest of the checks.
I have implemented all of the checks, but they need to be tested along with @shz9 functions
@kobrica Sorry, I haven't had a lot of time to look at your implementation in detail. But it seems there are a couple of syntax errors:
UnboundLocalError: local variable 'ts_n_succ' referenced before assignment
Can you please test your code in a jupyter notebook for now? You can use the template in Experiments.ipynb
.
@shz9 I will try to test everything out in jupyter notebook, I think I know what the error is here.
I re-arranged the code a bit. It doesn't work quite well for the diploid case. We can discuss the remaining steps soon.
Aligner objects as currently implemented only map nodes in the tree sequence to nodes in the pedigree. This can be done either iteratively or in one step. One thing that's missing is "harmonization" or "sanity checking" for greedy algorithms. The goal of the harmonization step is to make sure that the mappings make sense and are consistent with other information that we have, e.g.:
n
in the tree sequence is matched to a founder node in the pedigree, then it's likely that all or most of the predecessors ofn
are out-of-pedigree nodes.n_ts
,n_ped
), make sure that their successors and predecessors preserve their time-ordering. For example, no predecessor ofn_ts
should map to a successor ofn_ped
and vice versa.After these sanity checks are implemented and used to correct the mapping (if possible), then one final thing that can be added is the pedigree node to tree-sequence edge mapping: