Open bsweger opened 1 week ago
@rogersbw @nickreich @elray1 Feel free to update this if I got the details wrong!
Also, please weigh in on how to format this information for ease of use by the scoring process.
I made a few updates to clarify that this has to do with combinations of location and date with any reported data by the time of the round closing.
Copying in clarification from a recent Slack thread:
Q: What did we decide about the evaluation dates - are we going to include in the evaluation ANY location/dates with no counts as of Wednesday night or just the set of location/dates with no data until one is observed? A: any location/dates without observations
No time travel time required!
Background
To address the "scoring fairness" modeling challenge identified in the variant-nowcast-hub's README, this hub's evaluation logic will exclude state/date combinations with reported sequence data as of the reference date for a given round that falls into the bounds of the round's target dates.
In other words, any combination of state and target date (where the target date is within 31 days before the Wednesday reference date of the round) that--as of the 8 PM Wednesday submission deadline--has one or more Sars-COV-2 genome sequences with a collection date equal to that target date will not be scored.
Definition of done
virus-clade-utils
can generate a data structure of states and the number of sequences for each collection date betweenx
andy
, wherex
will be 31 days prior to 0 horizon andy
will be the day of 0 horizon/nowcast date/round_id (both inYYYY-MM-DD
format).variant-nowcast-hub
has a new script that gets the above data and writes it toauxiliary-data/unscore-location-dates