Closed elray1 closed 1 week ago
I think that we have elsewhere indicated that <model_id> == <team_abbr>-<model_abbr>
and that teams can choose one representation to use, as indicated in their model metadata schema file.
I don't recall the specifics of that decision, but I support <round_id>-<model_id>.csv
or .parquet
as a file format.
Thanks for raising this!
My .02 on the first question, mostly from the perspective of how we'll move hub data to the cloud and open it up to a non-hubverse audience.
This implies a specific structure for model ids as
- , but it may be clearer to just indicate here that the folder names correspond to model_ids, and we can discuss conventions about composition of model_id elsewhere.
Removing the separate model-abbr
and team-abbr
columns from the "cloud transformed" model-output files in favor of a single model_id
column simplifies the data conversion process. It does put the onus of parsing out team/model on data consumers, but I think it makes sense to favor the simple approach and revisit if we get feedback.
I don't recall the specifics of that decision, but I support
- .csv or .parquet as a file format.
Agree with @nickreich's comment re: item 2 (especially if we agree to make YYYY-MM-DD
the required format for round_id
, since that creates a definitive way to parse out round and model from a model-output filename).
Again, this is from the perspective of a cloud-enabled hub. While model_id could be obtained via "directory" structure or from a column in the actual file, I can see how it would be handy to have that information encoded in the filename, especially if people lose the directory structure context when downloading data.
It's been a week since anyone has chimed in, so I'm going to assume that we'll proceed with @nickreich and @elray1's suggestions above:
<round_id>-<model_id>
formatmodel_id
that contains anything after round_id in the filename#2 reflects hubverse-transform
work to address the latter.
This page: now shows structure as
OK to close issue? @elray1 @nickreich
this is similar to an issue Anna raised in closed issue #116
Agree that this can be closed.
Looking at this page: https://hubverse.io/en/latest/user-guide/model-output.html
Currently the folder and file structure are listed as follows:
team1-modela
<round-id1>.csv
(or parquet, etc)<round-id2>.csv
(or parquet, etc)team1-modelb
<round-id1>.csv
(or parquet, etc)team2-modela
<round-id1>.csv
(or parquet, etc)Two comments about this:
<team_abbr>-<model_abbr>
, but it may be clearer to just indicate here that the folder names correspond tomodel_id
s, and we can discuss conventions about composition ofmodel_id
elsewhere.<round_id>.csv
, or<round_id>-<model_id>.csv
? I think we've decided to includemodel_id
as a check that submissions landed in the right folder, but I'm not sure.