update default value for model_output_directory

elray1 commented 1 year ago

I think it should be "model-output", not "model_output".

https://github.com/Infectious-Disease-Modeling-Hubs/schemas/blob/de580d56b8fc5c24dd36a32994182e37b8b0ac95/v2.0.0/admin-schema.json#L635

bsweger commented 6 months ago

Is there a known use case for allowing hubverse admins to customize this directory?

bsweger commented 6 months ago

@elray1 @LucieContamin pinging you both because you weighed in about this in today's Hubverse dev meeting.

Re: customizing the model-output directory, I'm trying to understand the balance between:

flexibility for hub admins
potential value of standardization to the Hubverse at large
long-term maintenance/dev work for Hubverse devs

In this specific case, the impact of allowing people to customize the model-output directory is, perhaps, minor, but it's worth being explicit about the above tradeoffs (for this and for every customization we decide to accommodate).

A few examples/questions related to items 2 and 3 above:

If there is ever a need to join model outputs across hubs (e.g., current and archived FluSight data, or seasonal hubs with multiple repos), having a standardized Hubverse directory name for the files makes this process easier and less error-prone
As we move to the cloud, we are providing a path for people to access data without Hubverse context. My .02 is that we'd be doing people a service if they know exactly where to find the model outputs in a Hubverse hub without having to reference a config file.
Having a configurable model-output directory adds a small but measurable task to our internal downstream processes (e.g., hubData, the cloud sync action, visualizations/dashboards). Parsing a hub's config file to find the model-output directory is one more thing that every tool built on top of these files will have to do.

Accepting internal complexity in the name of simplicity for Hubverse users is what we should be doing. In this case, however, it seems like there's potential to do the work to accommodate flexibility and also make things harder for a certain population of data consumers.

Aside from historical data/hubs, is there a compelling reason to let hubverse admins use a different name than model-output? Do we know what hubs are currently using a name other than model-output?

LucieContamin commented 6 months ago

Thanks for the detailed information and context.

About allowing customizing the model-output directory: as it's currently used and well integrated I will prefer not to go back on that decision and force it to be a defined "model-output" for example. The information is stored in a "standardized" .json and is easily accessible if necessary. It's also already integrated in the HubData, HubAdmin tools.

For the user, I don't think it's that big on an issue if you provide documentation and example code on how to load the data, especially if using the HubData package.

Also, it allows new hub to choose their folder, which might be useful especially if they don't want to do everything in English, for example.

Currently, two of the US SMH are using another folder name.

elray1 commented 6 months ago

Thanks for raising the issue, Becky, and for your thoughts Lucie!

Summing up thoughts so far:

some existing hubs use names other than model-output for this folder
this feature could be useful for hubs that want to organize in languages other than English
we don't see any reason for a new English-based hub that is starting up to use anything different than model-output.

Here are some thoughts/reactions to points 1 and 2:

My feeling is that in instances where we think the right way to go breaks with existing hubs, we shouldn't let that hold us back. The way we've handled this in the covid and flu forecast hubs is to restart the hubs in new repos following hubverse standards (this has included changing the name of the model-output folder for both of those hubs), with a planned migration of older data into archival repos that make the historical data available to hubverse tooling. So to me, the fact that the SMHs are using another folder name currently doesn't seem like a key reason to make a decision about hubverse infrastructure setup going forward, if that would allow for simpler tooling and lower maintenance in the future. I agree that the maintenance cost for this particular item is low, but if we can identify several things like this, it could add up to a simpler system overall.
However, I think the idea of offering support for other languages is worth considering carefully, and is tied to this as Lucie points out. There is a related issue here. I might vote for making a decision about whether we plan to follow through on that issue, and letting that inform what we do about this.

FWIW, I don't feel extremely strongly about this one way or another, these are just my thoughts on this since I was asked :)

nickreich commented 6 months ago

Thanks for the discussion here, and input from all the varied perspectives.

I envision localization as something that we might want to head towards in something like 9-18 months from now. Not on our short term list of "big" things to tackle, but on a list of fairly high priority "nice to haves". I don't know what our priority list will look like in 9+ months, but my inclination is to not engineer for something that is that far into the future, and instead to focus on the immediate use-cases in hand. I'd welcome input from others who have more software dev experience on this than I have! I do think that offering support for other languages is important, just not as important as locking down base hubverse functionality.

elray1 commented 6 months ago

thanks for weighing in, Nick. so if offering support for other languages is important, i'd vote that we not take this existing functionality out, and go ahead with updating the value of the default.

bsweger commented 6 months ago

Great point about the need to consider localization--hadn't considered that!

@elray1 can you clarify what you mean by "existing functionality"?

Are there any active, fully-Hubverse-compatible hubs (i.e., not SMH) that use this feature?

elray1 commented 6 months ago

What I mean by "existing functionality" is that hubData already supports handling of the model output directory as a configurable variable rather than a hard-coded thing, e.g. here: https://github.com/Infectious-Disease-Modeling-Hubs/hubData/blob/059aa13fd0f97b31ef52dca396b297385491149a/R/utils-connect_hub.R#L46-L50

bsweger commented 6 months ago

Got it--yep, I realize we can already do that.

As I understand the downsides of removing this customization:

historic hubs: we can fix things up in the data conversion process to hubverse format, if needed
localization: we're at least 9-18 months out and can revisit as a part of that work
Revising a past decision is annoying for sure

Are there other cons to removing this customization? The ones above don't outweigh the cons of non-standard folder names as outlined above, imo.

(Unless doing so would break an existing Hubverse hub, then agree we shouldn't do it)

bsweger commented 6 months ago

To clarify the above...what I'm getting at is that allowing customization of the model-output folder name adds actual work in the present to maybe enable a future feature that isn't solidly roadmapped.

It's always a tricky call, and favoring the future feels usually like the right thing. In my experience, however, it's hard to predict more than a quarter or two out, and I've often regretted a code base with complexity that felt necessary at the time but ended up unused as priorities evolved.

Again, in this case the complexity isn't terrible, but it's worth considering for a small team and because it potentially impacts users (i.e., not just the dev team).

elray1 commented 6 months ago

I don't feel like I have much more to add :)

I understand your points, and also continue to feel some reluctance about putting effort into removing stuff that's done and that we would certainly want as part of a localization feature which it sounds like is a fairly high priority medium-term thing, and I also continue to not really have strong feelings about this either way.

bsweger commented 6 months ago

I don't feel like I have much more to add :)

Same!

I can "disagree and commit" on this, though I won't prioritize the corresponding cloud sync issue to accommodate customized model-output directory names (because there is more pressing work, and someone using the action can easily change the directory name if they need to).

Returning to the original purpose of the issue, sounds like there's no good reason not to update the default value as you suggest?

hubverse-org / schemas

update default value for model_output_directory #63