sars-cov-2-variants / lineage-proposals

Repository to propose and discuss lineages
42 stars 2 forks source link

Illumina induced artefact of Orf1a:E4388K in JN.1 (50+ branches) #1313

Closed aviczhl2 closed 6 months ago

aviczhl2 commented 7 months ago

There seems to be multiple branches of Orf1a:E4388K in JN.1.

General Query: G13427, T3565, T22926 (total=1012 seqs)

Despite total is 1012, each separate branch has at most ~60 seqs , no branch has more than 100. And there seems to be at least 50 different branches. This disrupts the tree.

Countries: Brazil, Canada, Denmark, Singapore, USA, UK. too few for branches with 1000+ seqs.

Further investigation shows 967/1012 are labeled Illumina. Looks like Illumina induced artefact, but reason is unknown @ryhisner @FedeGueli

Suggest to add a branch-specfic mask on this position @AngieHinrichs

general usher

image

Everything on this is JN.1*+Orf1a:E4388K. (Generally representaed by G13427A on GISAID but some with G13427R )

ryhisner commented 7 months ago

The reason is Ginkgo Bozoworks, plain and simple. This company is a joke, yet they somehow got the federal contract to do some of the most important sequencing in the country. This is a multibillion-dollar company, yet they can't be bothered to properly carry out the job they were hired to do. Pathetic.

aviczhl2 commented 7 months ago

The reason is Ginkgo Bozoworks, plain and simple. This company is a joke, yet they somehow got the federal contract to do some of the most important sequencing in the country. This is a multibillion-dollar company, yet they can't be bothered to properly carry out the job they were hired to do. Pathetic.

This artefact appears in multiple countries though.

corneliusroemer commented 7 months ago

I made a new repo to discuss artefacts: https://github.com/sequence-review/sars-cov-2/issues/new

This might be a topic for an issue in it :)

Artefact could go both ways: absence of mutation when it should be there and presence when it shouldn't. Maybe there are raw reads for some of the sequences with the mutation to look at?

AngieHinrichs commented 6 months ago

BA.2.86 has only four branches with G13427A that have more than 10 sequences. The largest branch is JN.1 > G13427A; it has 100 sequences, and does seem to be almost entirely GBW but pulling in a few non-GBW sequences. It also has a branch with JN.1.4's C774T which probably belongs in JN.1.4.

JN.1.4 > T18453C > G13427A has 21 sequences, about half GBW but also some from Canada, Denmark, Japan and Australia.

JN.1.1 > C11747T > G13427A has 84 sequences; GBW and Brazil together make up the majority, but many other countries are represented. I see a tell-tale reversion of 11747 too.

There is also a little branch of 16 Brazilian sequences (and a nearby branch with 5 Brazilian sequences) following 3 other mutations on JN.1.

Although this isn't causing enormous damage as far as I can tell, one or more of those branches could potentially over time pull in more sequences that should be placed elsewhere. So I will mask it, thanks for pointing it out @aviczhl2. It will take effect in tomorrow's build (2024-02-13).