hubverse-org / hubverse-actions

GitHub Actions for common hubverse CI tasks
MIT License
0 stars 0 forks source link

Add a GitHub action to sync data to S3 #11

Closed bsweger closed 9 months ago

bsweger commented 9 months ago

Resolves #10

First pass at an "official" version of the cloud sync workflow.

This version uses information from the admin config to get the name of the S3 bucket and is safe to use on hubs that are not yet cloud enabled.

There will be follow-up PRs if we end up using GitHub actions to trigger post-sync data actions (e.g., converting model-output data to parquet).

bsweger commented 9 months ago

@annakrystalli this action grabs cloud values from admin.json using the names we were discussing here: https://github.com/Infectious-Disease-Modeling-Hubs/schemas/issues/65#issuecomment-1966884012

I'll work on the schema PR next, but it's trivial to update this action if we end up using different names.

annakrystalli commented 9 months ago

This looks like a great start. I have two related questions: 1) I'm not clear from the process how authentication is handled exactly? Is this only set up once and then any change that happens in the repo is just synched without further authentication? If so, is that safe? Asking completely out of curiousity as I can't say I know best practice for these sort of situation. 2) The hubverse AWS account is hard-coded in the workflow. I'm assuming that's safe. More importantly, how easy would it be for folks who don't want to host on the hubverse account to set up synching to their own AWS account? I think this is an important use case we should be able to support. Totally get that this workflow can be a good first step towards that but I think it's a good time to already think about generalisation as, perhaps this can afffect the name of the action? E.g. perhaps we want the name of this particular action to include hubverse and aws to reflect the specificity involved in it?

bsweger commented 9 months ago

@annakrystalli Great questions, thanks!

AWS Authentication We'll do a deeper dive into the authentication during the demo next week, but at a high-level:

AWS Account Number Yes, we're not considering an AWS account number to be a secret. Agree that we should have a path for people who want to provide their own cloud hosting. To do that using this proposed setup, a hub admin would need to:

We could choose to move the hard-coded Hubverse AWS account number into the admin.json schema as a default value. I chose to put it here instead because I didn't love the idea of tightly-coupling our AWS account number with the schema.

That said, if you think that providing the account # in the schema and reading it dynamically in the action will provide a smoother path, happy to make that change here and in the related schema PR!

annakrystalli commented 9 months ago

Thanks for the info!

I totally agree that hard wiring the account number into the schema is not ideal but I don't think we need to. The schema would just need to check that whatever value is given conforms to what is expected for an AWS account. It's something each hub would need to add to their admin.json but perhaps that could be part of onboarding. A bit manual which I don't love but at least is intentional.

Having said I think it's fine to keep it in this particular action but I think the name of the action should reflect that so users know this template is for syncing to a hubverse hosted hub in AWS.

bsweger commented 9 months ago

I totally agree that hard wiring the account number into the schema is not ideal but I don't think we need to. The schema would just need to check that whatever value is given conforms to what is expected for an AWS account. It's something each hub would need to add to their admin.json but perhaps that could be part of onboarding. A bit manual which I don't love but at least is intentional.

Having said I think it's fine to keep it in this particular action but I think the name of the action should reflect that so users know this template is for syncing to a hubverse hosted hub in AWS.

Got it--I see what you're saying about being intentional with the AWS account number. It's not clear how many people will want to use the cloud or will want to self-host (and where).

Until we get a better sense of that, great suggestion to rename the action to something that indicates the specific use case of sending data to a Hubverse-hosted AWS account (just pushed the commit).

annakrystalli commented 9 months ago

Awesome. This looks great. Happy for it to be merged.

bsweger commented 9 months ago

I think we're in good shape for the first iteration of this new AWS sync workflow--merging so people can actually give it a spin.