robert-koch-institut / SARS-CoV-2-Sequenzdaten_aus_Deutschland

Ein zentraler Bestandteil einer erfolgreichen Erregersurveillance ist das Verständnis der Verbreitung eines Erregers sowie seiner pathogenen Eigenschaften. Hierbei stellt das Wissen über das Erregergenom eine wichtige Informationsquelle dar. So erlaubt der Nachweis von Mutationen im Genom eines Erregers, Verwandtschaftsbeziehungen zu rekonstruie...
https://robert-koch-institut.github.io/SARS-CoV-2-Sequenzdaten_aus_Deutschland/
Creative Commons Attribution 4.0 International
67 stars 7 forks source link

implausible Omicron classification #14

Closed rgerhards closed 1 year ago

rgerhards commented 2 years ago

I see a number of sequences from 2020 and early 2021 classified as Omicron. This does not look plausible to me. May this be related to https://github.com/robert-koch-institut/SARS-CoV-2-Sequenzdaten_aus_Deutschland/issues/9 and the result of a variant PCR?

Data in question:

date_draw | IMS_ID | lineage | scorpio_call | sequencing_lab_pc  1 | sending_lab_pc | seq_type |   -- | -- | -- | -- | -- | -- | -- | -- 2021-03-17 | IMS-10013-CVDP-BD31B70A-5B53-4588-B22A-E40EE489E32... | BA.1.1 |   | 4779 | 73035 | ILLUMINA 2021-03-10 | IMS-10013-CVDP-7B4A9C47-3B1B-4F3A-85BC-94B4DA5FEB2... | BA.1.1 |   | 4779 | 95448 | ILLUMINA 2021-01-04 | IMS-10013-CVDP-B219DF48-6F4F-4A17-B19D-98DC45AF974... | BA.1.1 | Omicron (BA.1-like) | 4779 | 4779 | ILLUMINA 2021-04-02 | IMS-10013-CVDP-4D759CD1-2209-41DE-980C-E98F38D54BA... | BA.1.1 |   | 4779 | 28357 | ILLUMINA 2021-03-11 | IMS-10013-CVDP-333514E8-FCA9-49A4-BCE2-6F58419756B... | BA.1.1 |   | 4779 | 95448 | ILLUMINA 2021-03-25 | IMS-10013-CVDP-EB74C98E-815A-445E-AB76-8C659BE07B3... | BA.1.1 |   | 4779 | 28357 | ILLUMINA 2021-03-03 | IMS-10013-CVDP-035CC7B7-2FCA-4831-8089-3937D681718... | BA.1.1 |   | 4779 | 66386 | ILLUMINA 2020-12-26 | IMS-10013-CVDP-D91BBF83-C15E-4280-825F-26E4B301A2F... | BA.1 | Omicron (BA.1-like) | 4779 | 28357 | ILLUMINA 2021-04-16 | IMS-10013-CVDP-4EFD6A2A-4346-433F-8D2C-A2AEC01E1E0... | BA.1.1 |   | 4779 | 86154 | ILLUMINA 2021-03-10 | IMS-10013-CVDP-6D3F28AF-44EA-4BA8-B0D6-308DAD2E4CC... | BA.1.1 |   | 4779 | 95448 | ILLUMINA 2021-03-24 | IMS-10013-CVDP-5E19AFA5-5AC8-4812-AD7D-C1B6F7ABD87... | BA.1.1 |   | 4779 | 86154 | ILLUMINA 2021-03-05 | IMS-10013-CVDP-5FD87A5E-FAEA-43C1-B41E-D5F0BF2E0F8... | BA.1.1 |   | 4779 | 81737 | ILLUMINA 2020-12-22 | IMS-10013-CVDP-652AEF69-8797-4473-9730-40C8422356E... | BA.1 | Omicron (BA.1-like) | 4779 | 28357 | ILLUMINA 2021-05-03 | IMS-10013-CVDP-D1C0DC48-97F7-483D-9248-05CCD4DCB36... | BA.1.1 |   | 4779 | 4779 | ILLUMINA 2021-03-04 | IMS-10013-CVDP-4528BCA3-144F-47DB-BF9E-CCB3D373C74... | BA.1.1 |   | 4779 | 1665 | ILLUMINA 2021-02-19 | IMS-10013-CVDP-F6D01735-2811-4A43-9656-F8AF4506AD0... | BA.1.1 |   | 4779 | 81737 | ILLUMINA 2021-03-10 | IMS-10013-CVDP-9BE8CEF8-0042-48FE-B796-3475C6AA707... | BA.1.1 |   | 4779 | 95448 | ILLUMINA 2021-06-08 | IMS-10013-CVDP-536E691D-7DA2-4D70-BE14-0C512D8DBB0... | BA.1.1 |   | 4779 | 4779 | ILLUMINA 2021-03-17 | IMS-10013-CVDP-8D6464EE-0C90-48B8-8976-13714B74F79... | BA.1.1 |   | 4779 | 86154 | ILLUMINA 2021-03-11 | IMS-10013-CVDP-7EA8A3B8-F7CF-4422-B96A-91B02ECA4CE... | BA.1.1 |   | 4779 | 4779 | ILLUMINA 2021-03-23 | IMS-10013-CVDP-DABBDE37-F49C-4F25-B604-7DF183E3661... | BA.1.1 |   | 4779 | 4779 | ILLUMINA 2021-03-10 | IMS-10013-CVDP-4FA094F7-E334-41F7-87F5-38A42C2F478... | BA.1.1 |   | 4779 | 1665 | ILLUMINA 2021-09-02 | IMS-10013-CVDP-3445725E-9F15-4E2D-A4E4-F23949A8FEB... | BA.1.1 |   | 4779 | 4779 | ILLUMINA 2021-04-03 | IMS-10004-CVDP-33332ED0-2EB6-42F6-9FDD-166D0C19CAD... | BA.1.1 |   | 21502 | 21502 | ILLUMINA 2021-01-01 | IMS-10061-CVDP-D28E7308-BDB2-47C6-ABD9-A26778807F4... | BA.1.1 | Probable Omicron (BA.1-like) | 30159 | 30159 | ILLUMINA
lenaschimmel commented 2 years ago

It's striking that almost all affected samples have been sequenced by the lab with ID 10013 / postal code 04779 (I was confused for a moment by the four-digit postal code in the table).

Can you also add the processing date? I checked it manually for the bottom three entries of the table:

date_draw PROCESSING_DATE IMS_ID lineage scorpio_call sequencing_lab_pc  1 sending_lab_pc seq_type  
2021-09-02 2021-09-20 IMS-10013-CVDP-3445725E-9F15-4E2D-A4E4-F23949A8FEB... BA.1.1   4779 4779 ILLUMINA  
2021-04-03 2021-04-14 IMS-10004-CVDP-33332ED0-2EB6-42F6-9FDD-166D0C19CAD... BA.1.1   21502 21502 ILLUMINA  
2021-01-01 2022-01-15 IMS-10061-CVDP-D28E7308-BDB2-47C6-ABD9-A26778807F4... BA.1.1 Probable Omicron (BA.1-like) 30159

For the last one, it maybe just a typo in the year of the date_draw. For the other ones, date_draw and PROCESSING_DATE seem plausible in relaton to each other.

rgerhards commented 2 years ago

Indeed, processing date is interesting - is somebody analyzing old samples?

I have removed sending_lab to keep the table from becoming too wide. If useful, I can export the data set. And sorry for the postcode confusion - I have an integer column inside the database to preserve space and gain speed.

date_draw  2 | processing_date | IMS_ID | lineage | seq_type | sequencing_lab_pc  1 |   -- | -- | -- | -- | -- | -- | -- 2020-12-22 | 2022-01-13 | IMS-10013-CVDP-652AEF69-8797-4473-9730-40C8422356E... | BA.1 | ILLUMINA | 4779 2020-12-26 | 2022-01-13 | IMS-10013-CVDP-D91BBF83-C15E-4280-825F-26E4B301A2F... | BA.1 | ILLUMINA | 4779 2021-01-04 | 2022-01-24 | IMS-10013-CVDP-B219DF48-6F4F-4A17-B19D-98DC45AF974... | BA.1.1 | ILLUMINA | 4779 2021-02-19 | 2021-03-08 | IMS-10013-CVDP-F6D01735-2811-4A43-9656-F8AF4506AD0... | BA.1.1 | ILLUMINA | 4779 2021-03-03 | 2021-03-22 | IMS-10013-CVDP-035CC7B7-2FCA-4831-8089-3937D681718... | BA.1.1 | ILLUMINA | 4779 2021-03-04 | 2021-03-22 | IMS-10013-CVDP-4528BCA3-144F-47DB-BF9E-CCB3D373C74... | BA.1.1 | ILLUMINA | 4779 2021-03-05 | 2021-03-22 | IMS-10013-CVDP-5FD87A5E-FAEA-43C1-B41E-D5F0BF2E0F8... | BA.1.1 | ILLUMINA | 4779 2021-03-10 | 2021-03-22 | IMS-10013-CVDP-7B4A9C47-3B1B-4F3A-85BC-94B4DA5FEB2... | BA.1.1 | ILLUMINA | 4779 2021-03-10 | 2021-03-22 | IMS-10013-CVDP-9BE8CEF8-0042-48FE-B796-3475C6AA707... | BA.1.1 | ILLUMINA | 4779 2021-03-10 | 2021-03-22 | IMS-10013-CVDP-6D3F28AF-44EA-4BA8-B0D6-308DAD2E4CC... | BA.1.1 | ILLUMINA | 4779 2021-03-10 | 2021-03-22 | IMS-10013-CVDP-4FA094F7-E334-41F7-87F5-38A42C2F478... | BA.1.1 | ILLUMINA | 4779 2021-03-11 | 2021-03-22 | IMS-10013-CVDP-333514E8-FCA9-49A4-BCE2-6F58419756B... | BA.1.1 | ILLUMINA | 4779 2021-03-11 | 2021-03-22 | IMS-10013-CVDP-7EA8A3B8-F7CF-4422-B96A-91B02ECA4CE... | BA.1.1 | ILLUMINA | 4779 2021-03-17 | 2021-03-29 | IMS-10013-CVDP-BD31B70A-5B53-4588-B22A-E40EE489E32... | BA.1.1 | ILLUMINA | 4779 2021-03-17 | 2021-03-25 | IMS-10013-CVDP-8D6464EE-0C90-48B8-8976-13714B74F79... | BA.1.1 | ILLUMINA | 4779 2021-03-23 | 2021-04-16 | IMS-10013-CVDP-DABBDE37-F49C-4F25-B604-7DF183E3661... | BA.1.1 | ILLUMINA | 4779 2021-03-24 | 2021-04-06 | IMS-10013-CVDP-5E19AFA5-5AC8-4812-AD7D-C1B6F7ABD87... | BA.1.1 | ILLUMINA | 4779 2021-03-25 | 2021-04-06 | IMS-10013-CVDP-EB74C98E-815A-445E-AB76-8C659BE07B3... | BA.1.1 | ILLUMINA | 4779 2021-04-02 | 2021-04-19 | IMS-10013-CVDP-4D759CD1-2209-41DE-980C-E98F38D54BA... | BA.1.1 | ILLUMINA | 4779 2021-04-16 | 2021-04-29 | IMS-10013-CVDP-4EFD6A2A-4346-433F-8D2C-A2AEC01E1E0... | BA.1.1 | ILLUMINA | 4779 2021-05-03 | 2021-05-17 | IMS-10013-CVDP-D1C0DC48-97F7-483D-9248-05CCD4DCB36... | BA.1.1 | ILLUMINA | 4779 2021-06-08 | 2021-06-21 | IMS-10013-CVDP-536E691D-7DA2-4D70-BE14-0C512D8DBB0... | BA.1.1 | ILLUMINA | 4779 2021-09-02 | 2021-09-20 | IMS-10013-CVDP-3445725E-9F15-4E2D-A4E4-F23949A8FEB... | BA.1.1 | ILLUMINA | 4779 2021-04-03 | 2021-04-14 | IMS-10004-CVDP-33332ED0-2EB6-42F6-9FDD-166D0C19CAD... | BA.1.1 | ILLUMINA | 21502 2021-01-01 | 2022-01-15 | IMS-10061-CVDP-D28E7308-BDB2-47C6-ABD9-A26778807F4... | BA.1.1 | ILLUMINA | 30159
rgerhards commented 2 years ago

side note: SQL I use. Both CSVs are imported into separate tables as they are.

SELECT rki_sequenzen_meta.date_draw, processing_date, rki_sequenzen.IMS_ID, lineage, seq_type, sequencing_lab_pc FROM rki_sequenzen inner JOIN rki_sequenzen_meta on rki_sequenzen_meta.IMS_ID = rki_sequenzen.ims_id and rki_sequenzen_meta.date_draw <= "2021-11-01" where (lineage = 'B.1.1.529' or lineage like 'BA.%') and rki_sequenzen_meta.SEQ_REASON like 'N%' ORDER BY rki_sequenzen_meta.sequencing_lab_pc ASC, date_draw