Open annakrystalli opened 2 months ago
I would be in favor of not adding support for non-date round-ids for now, and only supporting round-ids that are in the format of dates. Are there clear usecases where supporting non-date round-ids would be useful?
@LucieContamin wrote in https://github.com/orgs/Infectious-Disease-Modeling-Hubs/discussions/7#discussioncomment-9236827
I am not sure I totally understand the issue here, sorry. But, for SMH, we mainly use
origin_date
asround_id
. However, we have some rounds where theround_id
is not theorigin_date
, and is only use in the filename, to be able to tag which file correspond to which round. In this case, the format ofround_id
does not matter a lot. We still use a YYYY-MM-DD format to follow the same "style" as the other round. Does that answer your question? or help?
Could you share an example of what such round_id
s look like, as it does matter what they contain in that we need to be able to consistently parse round_id
from model_id
in filenames so how we do that can be made easier or harder by whether we follow certain conventions in how we specify round_id
s (if they are not dates).
Additionally, in the rounds where round_id
s are not the origin date, what value does origin date contain in the files?
Would be super curious to see an example of both the tasks.json
and some files (including filenames) of what you describe!
The round_id
we are using is still in the ISO Date format: "YYYY-MM-DD", for example:
"round_id": "2024-05-15",
"round_id_from_variable": false,
"model_tasks": [
{
"task_ids": {
"origin_date": {
"required": ["2020-11-15"],
"optional": null
}, ....
So, the filename follow the "usual" format, for example: model-output/team2-modelb/2024-05-15-team2-modelb.gz.parquet
.
I am happy to provide more information and example, if necessary. I can also give you the link to the repository link to these rounds: https://github.com/midas-network/covid19-smh-research
Thank you @LucieContamin !
OK so it still is a date so still not an example of a non date round_id! 😜
Out of curiosity, what made you configure some rounds one way and some the other?
Ah yes, still a date but as I use it only for tracking files, it could have been anything I guess. It's not use for anything else.
We decided to configure it like this, because we have two rounds with the same origin_date
so we needed to use something else for round_id
.
Very useful context, thanks. I guess if we were to support non-date round ids, so long as they conformed to using round id that only contain alphanumerics and _
, I believe our current systems would work (see deep dive here).
And you still have origin_date
in your files so you have dates to match to target data and plot. It's when that date information is not included that issues can arise.
Background
Since the beginning of the project we have discussed and in general planned for supporting
round_id
s other than dates. So far this has not been necessary and our validations have focused on the assumption thatround_id
s will be dates, largely because all known hubs do indeed use dates as round IDs. Some discussing around this topic during development of the validation framework can be found here: https://github.com/Infectious-Disease-Modeling-Hubs/hubValidations/discussions/13However, the push to convert historical hubs to hubverse style hubs has resurfaced the question of supporting non-date round ids and the need to assess implementation implications and weigh them against the benefits of supporting this feature.
Implications of using non-dates as round ids
Use of a non-date round id has some important implications, most importantly on how submission windows are configured:
If a character string is used instead as a round ID, the specification of a window relative to a date contained within the file (or filename) is no longer possible. This means that each round would need to be configured individually, with an explicitly set submission window
start
andend
date for each round id. For example, if the following simplified flusighttasks.json
config is changed to use epiweek (in the formatYYYY-EPIWEEK
) instead of ISO date, for just specifying 3 rounds, the size of the config triples containing primarily repeated information (only the submission window specification changes:date round id
epiweek code round id
Work Required
If we do choose to go ahead and support non-date round IDs, the main work would be in modifying
hubValidations::parse_file_name()
to recognise and match non date round IDs.If we decide we will not support non-date round IDs, we need to update
hubDocs
to reflect that.