hodcroftlab / covariants

Real-time updates and information about key SARS-CoV-2 variants, plus the scripts that generate this information.
https://covariants.org/
GNU Affero General Public License v3.0
317 stars 111 forks source link

high number of 'non-greek' variants #347

Closed willgilks closed 2 years ago

willgilks commented 2 years ago

Hi, thanks for this great resource. I notice the proportion of variants which aren't assign a greek letter is very high in the data (all_tables.tsv). Whereas really the expected proportions are expected to be dominated by alpha, omicron, delta. Examples include 21K.21L, 20A.EU2, 20A/S:126A etc. I'm wondering if these can be standardised further, maybe the 'K' in the first samples for kappa, then 20A is for alpha.

emmahodcroft commented 2 years ago

Hi @willgilks - so glad to hear you're finding CoVariants useful!

CoVariants designates variants by their Greek letter whenever it's applicable. If variants don't have Greek letters this is because they were never assigned one - often because they predate the Greek letter naming system (which only came into play at the end of 2020 with Alpha/Beta/Gamma). In these cases, the variants do not have Greek letters. The letters after the numbers (ex: 20A) are part of the Nextstrain clade nomenclature system, which is separate from the Greek letters (variants are named by the year they are identified and then alphabetically).

You can see the image & table on the front page of CoVariants which show how the variants map to Greek letter (or not) and how they relate to each other. I hope that clarifies!