Closed joverlee521 closed 3 months ago
Used this Snakefile to upload 2024 human sera data to test_tdb
with changes up to https://github.com/nextstrain/fauna/pull/160/commits/f34c82686f0c5111738cf1813fd4cf9315fea81a
pyenv activate fauna
nextstrain build --cpus 2 --ambient --envdir ../env.d/seasonal-flu/ . --snakefile data/Snakefile --config year='2024' preview=False
This ran through 73 Excel workbooks without raising any errors 🎉
I will dig more into the uploaded data tomorrow to make sure nothing looks too out of place...
Of the 73 processed workbooks, 1 did not have any human sera references.
From the other 72 workbooks, the upload workflow added 5722 measurements to test_tdb/flu
.
This downloaded 5694 measurements that were all appropriately selected for the human host files. There were 28 measurements excluded because the virus_passage_category
was egg
while the serum_passage_category
was cell
. The seasonal-flu workflow explicitly excludes egg passaged test viruses in cell passaged titer data.
I manually spot checked 3 workbooks per subtype to verify all of the human sera reference measurements were included. At this point, I'm pretty confident that this at least works for the 2024 files.
I'll plan to merge and upload the 2024 data as part of tomorrow's ingest.
Description of proposed changes
First pass for ingesting VIDRL human sera references, focused on 2024 data. I'm hoping that ingesting older data should be as simple as updating the VACCINE_MAPPING, but we'll see...
Related issue(s)
Related to #158
Checklist