hubverse-org / schemas

JSON schemas for modeling hubs
Creative Commons Zero v1.0 Universal
4 stars 2 forks source link

Add a property for S3 bucket information? #65

Closed annakrystalli closed 4 months ago

annakrystalli commented 5 months ago

Once we are implementing links to S3 buckets, it might be good to document details of buckets in hub configs too?

bsweger commented 5 months ago

Yes, agree!

What would we want to include? Cloud provider? (where the only available option right now is AWS)

We could ask for the name of an S3 bucket, but if Reich Lab is hosting, I'd prefer to adopt a standard naming convention instead (e.g., hubverse-name-of-hub). That reduces the chance of using a bucket name that's already "taken" and makes management easier for us.

bsweger commented 4 months ago

👋 @annakrystalli I'm pinging you to see if this shows up in #github-firehose but also because using the hub config to store cloud settings would also be handy for our "infrastructure as code" efforts (for example, a canonical place to store the names of S3 buckets).

annakrystalli commented 4 months ago

😄 sadly no personal ping on GitHub slack but yes to storing S3 bucket names. Anything else that could be useful? Is region info necessary? In any case feel free to sketch out what you think the JSON object containing the necessary infoe might look like and we can work back to a schema from that.

bsweger commented 4 months ago

@annakrystalli Thanks! Getting back to this, the info we need for cloud-enabled hubs is pretty sparse at the moment. If we make cloud its own section of the config, hopefully we can grow it as needed w/o disruption.

Something like this?

"cloud": {
    "enabled": true,
    "github": {
      "org": "Infectious-Disease-Modeling-Hubs",
      "repo": "exampe-hub"
    },
    "provider": {
      "name": "aws-only-right-now",
      "storage": "name-of-s3-bucket"
    }
}

It leaves some wiggle room for a possible future of non-AWS providers.

annakrystalli commented 4 months ago

This looks good!

As for location of the config, it would belong in th admin.json file.

I should note that that already contains entries for repository_host and repository_url https://github.com/Infectious-Disease-Modeling-Hubs/schemas/blob/de580d56b8fc5c24dd36a32994182e37b8b0ac95/v2.0.0/admin-schema.json#L37-L47 so we may not want to duplicate repository metadata in the cloud property?

bsweger commented 4 months ago

so we may not want to duplicate repository metadata in the cloud property?

@annakrystalli Makes sense--so a simplified version would look like (changed provider to host to match the terminology used for the repository config)

"cloud": {
    "enabled": true,
    "host": {
      "name": "aws-only-right-now",
      "storage": "name-of-s3-bucket"
    }
}

Do we have any hubs that use something other than github as a repo host? We're leaning pretty hard into GitHub actions for cloud syncing (which I assume is ok, since so many other hub functions do too).

@annakrystalli what would you recommend as the next step? Should I submit PRs for:

annakrystalli commented 4 months ago

We'll definitely need updates to the schema repo in a new version.

Not sure whether we want this feature as part of the standard hubTemplate or something folks can add. Happy to defer to @nickreich & @elray1 for that one.

As for hubUtils, that won't need any updating unless dynamic validation is required (i.e. validation performed in R), that won't be captured by standard json validation against the schema.

In any case, I'm in the middle of splitting hubUtils and the particular functionality you are referring to will live in hubAdmin when finished. So while it's good to think about and detail any dynamic validation that might be required, implementing will have to wait until I'm finished (or I could include it in the first version of hubAdmin I'm working on)

Feel free to describe any dynamic validation required in an issue in this repo: https://github.com/Infectious-Disease-Modeling-Hubs/hubAdmin

annakrystalli commented 4 months ago

We will also need documentation in hubDocs and the template Github Action to perform the synching should live in https://github.com/Infectious-Disease-Modeling-Hubs/hubverse-actions

annakrystalli commented 4 months ago

Re: other hosts, we have indeed leaned very heavy into GitHub because all known hubverse hubs also live on GitHub. In time it would be great to also support other hosts but I don't believe that's a current priority.

bsweger commented 4 months ago

@annakrystalli Thanks for all the info. Per your advice:

Will wait for @nickreich and @elray1 to weigh in on the "should this be in standard hubTemplate" question. My personal .02 is to keep this in the standard config with cloud.enabled set to a default of false (easier for users to see config options in their entirety), though of course we'll need to handle configs w/o this section for backwards compatibility).

annakrystalli commented 4 months ago

My personal .02 is to keep this in the standard config with cloud.enabled set to a default of false (easier for users to see config options in their entirety), though of course we'll need to handle configs w/o this section for backwards compatibility).

That sounds like a good approach! As for back-compatibility, will be fine as we'll not make cloud a required property in the schema.

nickreich commented 4 months ago

Yeah, I would lean towards including the bucket info in template with it set to false.

bsweger commented 4 months ago

See the attached PR with the proposed schema change for cloud-enabled hubs.

I'll solicit reviews and get feedback before working on the corresponding release and documentation changes.

bsweger commented 4 months ago

Created a release for v2.0.1: https://github.com/Infectious-Disease-Modeling-Hubs/schemas/releases/tag/v2.0.1