tnc-ca-geo / animl-ingest

Lambda function for processing camera trap images
Other
0 stars 1 forks source link

Reduce concurrency when copying image files to S3, add 1 GB memory #83

Closed nathanielrindlaub closed 3 months ago

nathanielrindlaub commented 3 months ago

Issue

The ingest-zip Batch job was running out of memory while copying the images to S3 if the images were larger than usual (we first discovered it with a folder of 995 2.4 MB images).

Solution

I added an extra GB of memory to the Batch Job Definition and reduced the asyncPool concurrency from 1000 to 100. Surprisingly, in my testing this actually seemed to make the image saving go faster, not slower.

nathanielrindlaub commented 3 months ago

@ingalls, if you have a minute do you mind giving me the green light to make this change? I had vaguely remembered talking about why you had chosen 1000 for the asyncPool concurrency here and recall there being a reason, but I searched around and couldn't find any documentation or discussion around that decision.

ingalls commented 3 months ago

Hey! Yeah I was concerned about raising it any further due to memory constraints. I should have profiled it smaller to see if it was CPU bound but this is a great catch. Merge away in my mind!

nathanielrindlaub commented 3 months ago

Thanks @ingalls! hope all is well!