Closed joverlee521 closed 2 months ago
The error was caused by a new zika record that had internal quotes in the submitter.affiliation
:
{"accession": "OR701943.1", "completeness": "PARTIAL", "host": {"lineage": [{"name": "cellular organisms", "taxId": 131567}, {"name": "Eukaryota", "taxId": 2759}, {"name": "Opisthokonta", "taxId": 33154}, {"name": "Metazoa", "taxId": 33208}, {"name": "Eumetazoa", "taxId": 6072}, {"name": "Bilateria", "taxId": 33213}, {"name": "Protostomia", "taxId": 33317}, {"name": "Ecdysozoa", "taxId": 1206794}, {"name": "Panarthropoda", "taxId": 88770}, {"name": "Arthropoda", "taxId": 6656}, {"name": "Mandibulata", "taxId": 197563}, {"name": "Pancrustacea", "taxId": 197562}, {"name": "Hexapoda", "taxId": 6960}, {"name": "Insecta", "taxId": 50557}, {"name": "Dicondylia", "taxId": 85512}, {"name": "Pterygota", "taxId": 7496}, {"name": "Neoptera", "taxId": 33340}, {"name": "Endopterygota", "taxId": 33392}, {"name": "Diptera", "taxId": 7147}, {"name": "Nematocera", "taxId": 7148}, {"name": "Culicomorpha", "taxId": 43786}, {"name": "Culicoidea", "taxId": 41827}, {"name": "Culicidae", "taxId": 7157}, {"name": "Culicinae", "taxId": 43817}, {"name": "Aedini", "taxId": 1056966}, {"name": "Aedes", "taxId": 7158}, {"name": "Stegomyia", "taxId": 53541}, {"name": "Aedes aegypti", "taxId": 7159}], "organismName": "Aedes aegypti", "taxId": 7159}, "isAnnotated": true, "isolate": {"collectionDate": "2021-11-11", "name": "6PYUC2022"}, "length": 217, "location": {"geographicLocation": "Mexico: Yucatan, Merida", "geographicRegion": "North America"}, "nucleotide": {"sequenceHash": "6FD6033C"}, "proteinCount": 1, "releaseDate": "2024-05-01T00:00:00Z", "sourceDatabase": "GenBank", "submitter.affiliation": "Centro de Investigaciones Regionales \"Dr. Hideyo Noguchi\", Laboratorio de Arbovirologia", "submitter.country": "Mexico", "submitter.names": ["Argaez-Sierra,D.G.", "Baak-Baak,C.M.", "Cigarroa-Toledo,N.", "Garcia-Rejon,J.E.", "Tzuc-Dzul,J.C.", "Acosta-Viana,K.Y.", "Nunez-Corea,D.A."], "updateDate": "2024-05-01T00:00:00Z", "virus": {"lineage": [{"name": "Viruses", "taxId": 10239}, {"name": "Riboviria", "taxId": 2559587}, {"name": "Orthornavirae", "taxId": 2732396}, {"name": "Kitrinoviricota", "taxId": 2732406}, {"name": "Flasuviricetes", "taxId": 2732462}, {"name": "Amarillovirales", "taxId": 2732545}, {"name": "Flaviviridae", "taxId": 11050}, {"name": "Orthoflavivirus", "taxId": 3044782}, {"name": "Orthoflavivirus zikaense", "taxId": 3048459}, {"name": "Zika virus", "taxId": 64320}], "organismName": "Zika virus", "taxId": 64320}}
I confirmed locally that the output for format_ncbi_dataset_report
has the correct quoting in submitter-affiliation
.
accession accession-rev sourcedb sra-accs isolate-lineage geo-region geo-location isolate-collection-date release-date update-date length host-name isolate-lineage-source biosample-acc submitter-names submitter-affiliation submitter-country
OR701943 OR701943.1 GenBank 6PYUC2022 North America Mexico: Yucatan, Merida 2021-11-11 2024-05-01T00:00:00Z 2024-05-01T00:00:00Z 217 Aedes aegypti Argaez-Sierra,D.G.,Baak-Baak,C.M.,Cigarroa-Toledo,N.,Garcia-Rejon,J.E.,Tzuc-Dzul,J.C.,Acosta-Viana,K.Y.,Nunez-Corea,D.A. Centro de Investigaciones Regionales "Dr. Hideyo Noguchi", Laboratorio de Arbovirologia Mexico
The final produced metadata.tsv has double quoting in the institution
column, but this is due to an augur curate passthru bug.
genbank_accession genbank_accession_rev strain date region country division location length host release_date update_date sra_accessions authors institution
OR701943 OR701943.1 6PYUC2022 2021-11-11 North America Mexico Yucatan Merida 217 Aedes aegypti 2024-05-01 2024-05-01 Argaez-Sierra et al "Centro de Investigaciones Regionales ""Dr. Hideyo Noguchi"", Laboratorio de Arbovirologia"
Merging to get our ingest going again, but I'll loop back to the augur curate
issue`.
Manually triggered ingest-to-phylogenetic
The automated ingest workflow failed with a csvtk quoting error.¹ Following https://github.com/nextstrain/docker-base/pull/209, we can now use
csvtk fix-quotes
andcsvtk del-quotes
to work around the quoting issue.¹ https://github.com/nextstrain/zika/actions/runs/8926866948/job/24518932039#step:8:139
Checklist