nextstrain / fauna

RethinkDB database to support real-time virus analysis
GNU Affero General Public License v3.0
33 stars 13 forks source link

Generate list of new titers references on each new ingest #163

Open huddlej opened 2 weeks ago

huddlej commented 2 weeks ago

Description

Incoming titer measurements often include new reference strains that we need to include in our reports. It is easy to miss these by manual inspection each week and would be more effective to generate a list of these new references as they appear and somehow report these as part of the ingest.

joverlee521 commented 1 week ago

I'm not sure how to easily integrate this into fauna because the tdb/upload does an upsert using a generated index, not the reference strains.

Maybe this can be a new rule in the seasonal flu upload workflow. We could create a list of reference strains and their inclusion date, something like:

csvtk sort -t -k inclusion_date data/titers.tsv \
    | csvtk cut -t -f serum_strain,inclusion_date \
    | csvtk uniq -t -f serum_strain > reference_strains.tsv