tskit-dev / tskit

Population-scale genomics
MIT License
153 stars 72 forks source link

Implement state machine for tsk_variant_t and keep reference to ``tree_sequence`` in restricted_copy. #2436

Open jeromekelleher opened 2 years ago

jeromekelleher commented 2 years ago

We are currently using the tree_sequence attribute of a way of determining whether a variant is a frozen copy or not. We also use the variant->site.position attribute as a way of determining if the variant has been decoded. It would be simpler if we had a single state machine, which supported transitions:

VARIANT_STATE_NEW -> VARIANT_STATE_DECODED
VARIANT_STATE_DECODED -> VARIANT_STATE_DECODED
VARIANT_STATE_DECODED -> VARIANT_STATE_FROZEN_COPY
VARIANT_STATE_FROZEN_COPY -> VARIANT_STATE_FROZEN_COPY

Thus,

The current approach of using the tree_sequence is problematic because

I think we can also remove some complexity in tsk_variant_restricted_copy because we can then avoid taking copies of the alleles in the user_alleles memory.

jeromekelleher commented 2 years ago

Also Python testing of the variant state is needs beefing up. We need to test taking copies of copies, among other things.

jeromekelleher commented 2 years ago

We could consider renaming this to tsk_variant_frozen_copy (keeping the current name as an alias)