Closed joverlee521 closed 1 year ago
I think the solution for directly pulling RKI sequences will be similar to my ideas for the COG-UK data.
The different step here might be how to remove the RKI sequences from the GenBank data. I have not found an accession linkage file for the RKI data. However, we can do a blanket removal of all sequences linked to the RKI BioProject.
We could simply remove all German sequences uploaded to Genbank from March 2022 onwards and only spike from Germany's repo from that date onwards as a quick fix. This would be 80/20, more effort may not be worth it.
Resolved by #365
Context
Similar to #329, there has been a significant drop off in sequences from Germany in the NCBI data since ~April 2022 (this issue was originally raised by @corneliusroemer in Slack):
Description
We can update the open pipeline to pull metadata and sequences directly from the "SARS-CoV-2 Sequence Data from Germany" GitHub repo.
Possible solution
Similar solutions from #329 will apply here.