robert-koch-institut / SARS-CoV-2-Sequenzdaten_aus_Deutschland

Ein zentraler Bestandteil einer erfolgreichen Erregersurveillance ist das Verständnis der Verbreitung eines Erregers sowie seiner pathogenen Eigenschaften. Hierbei stellt das Wissen über das Erregergenom eine wichtige Informationsquelle dar. So erlaubt der Nachweis von Mutationen im Genom eines Erregers, Verwandtschaftsbeziehungen zu rekonstruie...
https://robert-koch-institut.github.io/SARS-CoV-2-Sequenzdaten_aus_Deutschland/
Creative Commons Attribution 4.0 International
67 stars 7 forks source link

Submission date lost for many sequences: SEQUENCE.PUSHED_TO_DWH empty for 1.099m out of 1.228m rows #50

Closed corneliusroemer closed 1 year ago

corneliusroemer commented 1 year ago

It appears that when moving from old to new metadata format you lost the sequence submission date for most sequences in SARS-CoV-2-Sequenzdaten_Deutschland.tsv.xz

Would it be possible to backfill those that have an empty string with the entry from either "RECEIVE_DATE" or "PROCESSING_DATE" of the previous metadata file?

This is the breakdown of the column at the moment:

$ xzcat ~/Downloads/SARS-CoV-2-Sequenzdaten_Deutschland.tsv.xz| tsv-summarize -H -g 'SEQUENCE.PUSHED_TO_DWH' --count       
SEQUENCE.PUSHED_TO_DWH  count
                                1099003
2022-10-24 09:37:39 +0200       1
2022-10-24 18:48:03 +0200       1491
2022-10-24 18:45:15 +0200       1
2022-10-26 18:55:12 +0200       1666
2022-10-26 18:52:19 +0200       1
2022-10-26 18:55:14 +0200       574
2022-10-26 18:55:13 +0200       71
< --- snip --- >
2023-07-03 18:14:07 +0200       1
2023-06-30 11:40:32 +0200       21
2023-06-30 11:37:43 +0200       1
2023-07-06 14:33:27 +0200       17
2023-07-06 14:30:20 +0200       1
RKIOpenData commented 1 year ago

Hello @corneliusroemer,

thank you for your feedback.

We implemented changes to provide the 'SEQUENCE.PUSHED_TO_DWH' property for all sequences.

These changes will be available with the next data update on Monday.

Best regards,

Felix Hartkopf

corneliusroemer commented 1 year ago

Hi Felix @RKIOpenData, thanks so much for your work! That's great news.

corneliusroemer commented 1 year ago

This seems to work now, so I'll close the issue. Thanks!