Open ftrebien opened 3 years ago
Hello @ftrebien , seems like one of the dates for the sequences from Spain has the wrong metadata (collection date: 2020-02-*). We expect this to be corrected on GISAID soon following which the changes will be reflected on outbreak.info.
I would disallow (at least for the lineages having > 5000 sequences) the oldest date to decrease by more than 2 months when new sequences are added, this should filter a lot of such metadata and assignment errors, perhaps also excluding the sequences not satisfying some molecular clock constraint during the oldest sample calculation. @gkarthik
Thanks @babarlelephant, we've been thinking for a long time how to better identify and filter out erroneous date metadata. We've been planning on a simple first/second date check as you suggest to limit the compute time associated with the date check, and only applying it to lineages with a certain number of sequences is a good idea. We'll keep you posted-- it seems like every month there's a mislabeled B.1.1.7 sequence.
The date found for lineage B.1.1.7 in Spain is different from the earliest sample date in Spain in PANGO. Why is there a difference? This difference also exists for B.1.351 in Qatar (PANGO) and P.1 in the US (PANGO).