katjes733 / aws-sentiment-analysis

Sentiment Analysis with AWS services
MIT License
3 stars 0 forks source link

Handle multi-file upload correctly #12

Open katjes733 opened 7 months ago

katjes733 commented 7 months ago

As an end user, I would like to be able to upload multiple CSV files at once and the sentiment analysis should only be triggered automatically once when all files are uploaded so that I dont have to combine mulitple CSVs into one manually before uploading.

Per #5, only one file may be uploaded at a time, as otherwise the state machine is triggered for each file (as per the EventBridge rule reacting to individual events per each file), which leads to inconsistent results.

We may need to queue the events first before triggering the state machine.

katjes733 commented 7 months ago

Unfortunately there is no way to know when the last file of a batch is uploaded. S3 emits one event for each Object Created. Queueing the events is not possible either because there is no final signal and periodic polling on the queue may be too risky:

Therefore we will not support multi-file uploads with this ticket directly, but instead support archives. That way multiple files can be archived together and we can accomplish better upload performance due to the input files being compressed. We continue to support individual CSVs and in addition archives (tar.gz, gz and zip).