robert-koch-institut / SARS-CoV-2-Sequenzdaten_aus_Deutschland

Ein zentraler Bestandteil einer erfolgreichen Erregersurveillance ist das Verständnis der Verbreitung eines Erregers sowie seiner pathogenen Eigenschaften. Hierbei stellt das Wissen über das Erregergenom eine wichtige Informationsquelle dar. So erlaubt der Nachweis von Mutationen im Genom eines Erregers, Verwandtschaftsbeziehungen zu rekonstruie...
https://robert-koch-institut.github.io/SARS-CoV-2-Sequenzdaten_aus_Deutschland/
Creative Commons Attribution 4.0 International
68 stars 7 forks source link

Diverse BA.2 and BA.5 sequences with spurious T11524C and A11537G -- primer trimming? #27

Closed AngieHinrichs closed 1 year ago

AngieHinrichs commented 2 years ago

I have noticed a pattern, almost exclusively in RKI sequences (but also in a few sequences from the Netherlands and France), of T11524C and A11537G occurring together in many different branches and sub-lineages of BA.2 and BA.5. It seems unlikely that those two mutations would co-occur dozens of times in sequences from many different lineages of BA.2 and BA.5. I collected some examples of what this looks like in a phylogenetic tree in a comment in cov-lineages/pango-designation#841. Then @JosetteSchoenma replied that this problem had been seen also in the Netherlands, but @JordyCoolen fixed it in https://github.com/JordyCoolen/easyseq_covid19/commit/e2412313ddaaf39a0b8014e4209d01c32a4d3245 by adjusting a primer trimming region to start at 11520 instead of 11525.

I am hopeful that a similar fix would remove the spurious T11524C + A11537G calls in so many RKI sequences. I don't have a list of affected sequences, but could make one if that would help. A few examples:

BA.5.1: Germany/NW-RKI-I-869542/2022 EPI_ISL_13382739 Germany/NI-RKI-I-869485/2022 EPI_ISL_13382683 Germany/NW-RKI-I-845428/2022 EPI_ISL_13239109

BA.5.2.1: Germany/NW-RKI-I-835345/2022 EPI_ISL_13033644 Germany/NW-RKI-I-848455/2022 EPI_ISL_13241301 Germany/NW-RKI-I-799266/2022 EPI_ISL_12739153

BA.5.3.2: Germany/SH-RKI-I-784836/2022 EPI_ISL_12674661 Germany/SH-RKI-I-823299/2022 EPI_ISL_13001607 Germany/NW-RKI-I-739643/2022 EPI_ISL_12346538

BA.2.36: Germany/BW-RKI-I-835322/2022 EPI_ISL_13033622 Germany/SN-RKI-I-723436/2022 EPI_ISL_12192631 Germany/NW-RKI-I-690722/2022 EPI_ISL_11818175

BA.2: Germany/BW-RKI-I-542774/2022 EPI_ISL_10032437 Germany/BW-RKI-I-542785/2022 EPI_ISL_10032456 Germany/NW-RKI-I-845418/2022 EPI_ISL_13239099

BA.2.9: Germany/NW-RKI-I-614789/2022 EPI_ISL_11455600 Germany/NW-RKI-I-637027/2022 EPI_ISL_11480485 Germany/NW-RKI-I-644150/2022 EPI_ISL_11591637

A11537G belongs in BA.1 -- but T11524C seems to occur independently many times in the BA.1 branches of the UCSC/UShER tree as well. So this probably affects all Omicron sequences.

Thanks!

MarieLataretu commented 2 years ago

Hi @AngieHinrichs,

thanks for reporting this! We are in contact with the labs. I was able to reproduce T11524C & A11537G with raw BA.5 data and the unmodified primer file.

https://github.com/JordyCoolen/easyseq_covid19/commit/e2412313ddaaf39a0b8014e4209d01c32a4d3245 suggests a primer start extension of amplicon 62 and 125. T11524C and A11537G are not called with this modified primer file. Also, C23604A is called "homozygous" instead of "heterozygous".

So far, we saw this only with the EasySeq SC2 kit and BAMClipper (trims based on genomic coordinates).