Closed elray1 closed 1 year ago
Some suggestions:
Inputs:
hub_connection
objectmodel_ids
optional list of models for which to load metadata. If not provided, load metadata for all modelsReturns:
tibble
with model metadata. One row for each model, one column for each top-level field in the metadata file. For metadata files with nested structures, this tibble may contain list-columns where the entries are lists containing the nested metadata values.Logic:
model-metadata
folder.yml
or yaml
file extensions.sapply
?to test, add some example model metadata files to one of the test hubs in inst/testhubs
. Would be good to get some complicated examples:
May be able to pull some from here
A good function to base the functionality around would be the hubUtils::read_config()
function and adapt it to read yaml https://github.com/Infectious-Disease-Modeling-Hubs/hubUtils/blob/main/R/read_config.R
It consists of two methods, one default and one that works with cloud file systems like S3 buckets.
Quick note of part of this suggested code too: https://github.com/reichlab/covidHubUtils/blob/7258bc1b146906b31e9d31d19fd13cf73259b5a0/R/get_model_metadata.R#L56-L65
purrr::map_dfr()
is now deprecated in favour of purrr::map() %>%
lpurrr::list_rbind()
I was thinking of including was automatically merging team_abbr
and model_abbr
fields or splitting the model_id
field using the functions from hubUtils
, but I wanted both of your input on some the specifics. I could see this functionality being implemented in one of three ways:
team_abbr
and model_abbr
fields when applicable and only keeping the single model_id
field in the resulting table of metadata.model_id
field when applicable and only keeping the team_abbr
and model_abbr
fieldsThe third option might be a little redundant but there is an argument for it in order to preserve all the fields in the original metadata files. Or we could not include this functionality. What are each of your thoughts?
I also see option
I vote for either option 3 or option 4. In favor of option 3, there's something to be said for just standardizing outputs across hubs, and there are situations where it is more helpful to be able to grab the model_id
field and situations where it is more helpful to be able to grab the team_abbr
field.
If we went with option 4, any functions that needed access to one of these could call whatever function we have to standardize outputs before trying to access it, but that seems like making extra work for ourselves down the line. So in the end I think I vote for 3
it would be nice to have a function to load model metadata. there is some code here that could be borrowed/adapted for this.