zooniverse / aggregation-for-caesar

Apache License 2.0
9 stars 15 forks source link

Batch Aggregation #785

Closed zwolf closed 5 months ago

zwolf commented 6 months ago

This PR contains the following:

This is ready for (re-)review. I have included some questions in the BatchAggregation spec file that could improve them if answered. This is dependent on https://github.com/zooniverse/panoptes/pull/4303 which provides the API to save run data on a Panoptes resource. Looking for feedback on the whole pipeline, which looks like this:

Request sent to Panoptes --> Panoptes sends run_aggregation request --> celery job starts --> exports downloaded & processed --> extraction, reduction --> create csv files, zip them --> upload data to storage --> send request back to Panoptes containing run UUID.

Merging https://github.com/zooniverse/aggregation-for-caesar/pull/783 was an accident, as I had intended to keep that branch-of-a-branch separate. I created a new batch-aggregation-staging branch to merge into that I can add a deployment template and deploy directly from for testing. cc @lcjohnso on the new PR.

zwolf commented 6 months ago

@CKrawczyk Thanks for all the feedback!! I implemented everything I could and commented, lmk if the changes/new stuff are good 2 go!

I plan on merging this PR into the batch-aggregation-staging branch, then creating a PR off of that branch to add a new deploy template for a staging instance that deploys from that branch alone. I'll test everything (adding redis and volumes, celery, API, etc) without affecting the existing app.