add source data and code for producing example `model_outputs` and `target_data`

elray1 commented 9 months ago

I am realizing that the original data and code for this is currently just sitting on my laptop, which is bad.

I see two options here:

add a folder to this repository, called something like data-raw, that contains the original data files and processing code.
- advantage: code and data in the same repo, makes future maintenance easier and easier to see where stuff came from
- disadvantage: this is supposed to be an example hub that might be useful to people setting up hubs. Those hubverse users will not want to have that data-raw folder. This could be confusing.
create a second repository, called something like example-simple-forecast-hub-source (open to other naming ideas), that contains this code. maybe there's an expectation that the two repos would be cloned into the same folder on the developer's computer so that relative paths can be used in example-simple-forecast-hub-source to put the data files in the right place in example-simple-forecast-hub
- advantages and disadvantages are the opposite of item 1

I'm leaning toward option 2 but soliciting input.

nickreich commented 9 months ago

No super strong feelings on my part about either of these things. I think doing option 2 feels a bit more convoluted, but also, as long as it's well documented, perhaps the best option.

micokoch commented 9 months ago

Both sound like good solutions to me.

bsweger commented 9 months ago

Can you say more about who the users are for this?

Internal Hubverse developers who want to publish sample data and make it available for packaging in other hubverse libraries?
People standing up their own hubs want want to produce their own sample data?
Both?

elray1 commented 9 months ago

After in-person discussion, Becky votes for keeping stuff in the same repo, maybe in one high level folder called _internal_data, with documentation in readme that this is not intended to be part of the hub structure.

bsweger commented 9 months ago

After chatting with @elray1, here's my understanding of the "audience" question below:

The sample data repo has two audiences:

People standing up a new hub who might find it useful to see examples of model output data
Hubverse devs (i.e., us) who want to a single source of example data to bundle with our packages (so new users can get up and running quickly with tutorials and actual data)

[this doesn't really answer Evan's original question, just surfacing the convo in case anyone else out there is trying to solidify their understanding of the problem space]

Can you say more about who the users are for this?

1. Internal Hubverse developers who want to publish sample data and make it available for packaging in other hubverse libraries?

2. People standing up their own hubs want want to produce their own sample data?

3. Both?

hubverse-org / example-simple-forecast-hub

add source data and code for producing example `model_outputs` and `target_data` #8