tnc-ca-geo / animl-ingest

Lambda function for processing camera trap images
Other
0 stars 1 forks source link

Implement retry policy #35

Closed nathanielrindlaub closed 1 year ago

nathanielrindlaub commented 1 year ago

There have been a few occasions in which the image metadata was successfully extracted and posted to the GraphQL API (and stored in the DB), but the transferring of the image(s) to S3 failed. This requires a bit more investigation as to why this is occurring (see the half-dozen images still stuck in limbo the ingestion bucket), but I think regardless we should:

nathanielrindlaub commented 1 year ago

@ingalls and I discussed this and took a look at how ingestion errors are handled now that we're writing them to the ImageErrors collection (see https://github.com/tnc-ca-geo/animl-api/pull/102), and concluded that we don't want to increase the maximumRetryAttempts because the first thing that should happen is that we store an image record and get an Image ID back, and all subsequent errors (including errors opening the image, resizing it, and copying it to S3), will get caught and written to the ImageErrors collection and will be exposed to the user. This is desirable behavior b/c we need the image record and image ID so that we can reference it in the ImageErrors collection if need be. So any retry attempts would attempt to save the image to the DB again and would throw a duplicate image error.

If an error is thrown in image-ingest, that image will still get moved to the dead-letter bucket, and it's likely due to an image corruption issue or bug in our code thus it's likely not that useful to try it again anyhow. Closing this out.