jeromekelleher / sc2ts

Infer a succinct tree sequence from SARS-COV-2 variation data
MIT License
4 stars 3 forks source link

Replacement terminology for "is_consistent" #103

Closed hyanwong closed 1 year ago

hyanwong commented 1 year ago

We currently define a "consistent" recombinant in the following way:

def is_consistent(self):
    return len(self.matches[0].parents) == len(self.matches[1].parents) and (
        self.matches[0].mutations == self.matches[1].mutations)

The word "consistent" is confusing here. We should come up with a better term.

jeromekelleher commented 1 year ago

The verbose description is something like "robust to HMM coordinate reflection", or something. What it's doing is saying that we have to chose sequence-identical parents when we have mirrored and unmirrored coordinates - given that we have the same number of parents and the mutations are the same, those parents must be identical in state. We don't insist on the parent nodes being the same because this is overly restrictive (when we have identical parents to choose from, the HMM will choose arbitrarily).

Does that help?

hyanwong commented 1 year ago

I like "robust" (much preferable to "simple"), but it may give the wrong impression that these are "good" (or "True") recombinants. I think @szhan was thinking about good terminology to use.

szhan commented 1 year ago

I also prefer "robust". I have been calling them "robustly identified" in the MM section.

jeromekelleher commented 1 year ago

I worry that "robust" will be interpreted as "good" or "confident". We could say something like "HMM orientation consistent"? We probably don't say it that much, so something precise might be preferable?

szhan commented 1 year ago

Let's just go with "HMM-consistent" in the preprint for now.

jeromekelleher commented 1 year ago

I'll use this term in the code also when implementing #108

hyanwong commented 1 year ago

I think we have fixed on "HMM-consistent" now