jeromekelleher / sc2ts

Infer a succinct tree sequence from SARS-COV-2 variation data
MIT License
4 stars 3 forks source link

Recombinant tables #108

Closed szhan closed 1 year ago

szhan commented 1 year ago

I think it would be nice to have supplementary tables listing the recombination nodes in the preprint, one for the wide ARG and the other for the long ARG. In previous versions of the tables, we had the following columns:

  1. node ID
  2. imputed Pango lineage status
  3. date of insertion
  4. number of sample descendants
  5. imputed Pango lineage status of the left parent
  6. imputed Pango lineage status of the right parent
  7. zero-based left coordinate of the breakpoint interval
  8. zero-based right coordinate of the breakpoint interval
  9. num of mutations explaining them (from the parents)
  10. list of the mutations counted in (9)

I think "number of sample descendants" should include the causal recombinant sequence. We just have to explain this in the text.

jeromekelleher commented 1 year ago

I'm on it. Note that the format necessarily means 2 parents, that OK?

szhan commented 1 year ago

Yes, that's perfect. I thought that we have decided to focus on HMM-consistent recombinants.

jeromekelleher commented 1 year ago

Well, you can imagine defining hmm consistency with more than 2 parents, but I didn't bother. So, the table will be of recombinants that have 2 parents, and are hmm consistent.

szhan commented 1 year ago

Ah, yes, "same number of parents". We use different subsets of recombinants in different sections of the preprint, so getting the definitions mixed up. Two-parent HMM-consistent recombinants, please!