tnc-ca-geo / animl-ingest

Lambda function for processing camera trap images
Other
0 stars 1 forks source link

Architecture: Why are the filenaming conventions in the different target buckets so different? #39

Closed postfalk closed 1 year ago

postfalk commented 1 year ago

If they were unified we could use the same copy/save function. Alternatively we can move the filename generation out of the copy function and provide it as argument. Template argument would be another option. Will see what I can do once I have stable tests.

nathanielrindlaub commented 1 year ago

I don't have strong feelings about it but the reason they are different is because they have different uses, i.e., I imagined the archive bucket would be used by humans users and thus it would be helpful to to have references to the original filenames and camera serial numbers if you're trying to find specific images. But I don't trust the original file names to be unique for each camera, so I wanted to prevent images with the same original filename from being overwritten by also including the image's unique hash in the key. So the keys in the archive bucket end up being [camera_serial_number]/[original_filename]_[hash].jpg.

The serving bucket on the other hand is not used by humans (only clients are the frontend and the ML handlers), so references to the camera serial number and original filename aren't necessary, but it also requires that we save the original images along with their downsized (medium & thumbnail) versions.

I'm open to suggestions though!