hubverse-org / hubverse-cloud

Test hub for S3 data submission and storage
MIT License
0 stars 0 forks source link

Create proof-of-concept for using S3 triggers for automated conversion of model-output files #52

Closed bsweger closed 7 months ago

bsweger commented 7 months ago

There was some conversation here that resulting in the conclusion that we should not rely on GitHub CI actions for triggering the conversion of incoming model-output files to parquet format.

As a next step, I'd like to explore using S3 event notifications as way to invoke actions when a model-output file is written to a hub's S3 bucket.

Specifically, these notifications:

At a high level, the idea is to invoke our prototype "transform model-output file to parquet" function automatically, whenever a model-output file is uploaded to S3 (this happens via GitHub action).

model submission PR merged -> model-output data syncs to S3 -> S3 "new object created" event triggers an AWS lambda version of the "convert data to S3 function"

Definition of done:

The AWS resources for this will be created manually (i.e., no need to incorporate into our infrastructure as code process unless we decide this solution will work for us).

bsweger commented 7 months ago

Let's scope this work to the "new object created event." If it seems like a good way to proceed, the step would be code the corresponding action when a model-output file is deleted.

bsweger commented 7 months ago

Keeping some notes on this experiment here: https://reichlab.atlassian.net/wiki/spaces/RLD/pages/13631576/Automated+model-output+transforms

bsweger commented 7 months ago

This is done--I gave @annakrystalli a demo on how it works and we agreed that we should proceed with the use of AWS event notifications + a lambda function to handle conversion of the model-output files.

I tried (and failed) to record the demo, but can do it at the next dev meeting for anyone interested.