Open szhan opened 1 month ago
Here is sampling frequency when looking at only the samples with Viridian_cons_het == 0
and Viridian_cons_het != .
. There are 2,040,650 samples.
Since the sampling is quite thin before March 1st, we can probably relax the filter on hets for that part of the ARG. We can impose the het = 0 filter on the samples onwards. The early Delta and closely related sample crop up in March/April.
Viridian_cons_het == 0
is too strict. Lots of samples have at least 1 het.
What do you suggest so? I might run this over the weekend.
Just noting that we don't have any good samples till March 2021 for Delta B.1.617.2.
While looking at the earliest HMM group of samples attached in
long_arg_v7_clustloc-mrm_2-rw_10-mgs_10-2021-06-30.ts.tsz
(md5sum:6cde6e2c00624a505aa00063973368f2
), I noticed that the samples have a suspiciously high number of ambiguous characters (specifically K).Also, these samples have a mix of Viridian Pango labels: B.1.617.2, n = 3 B.1.617, n = 2 B.1.617.1, n = 7
These samples may be making it harder to build a good local tree around the start of the Delta wave. By being strict on the number of ambiguous character (
Viridian_cons_het == 0
, ignoring '.'), we may be able to do better here.