tnc-ca-geo / animl-api

Backend for https://animl.camera
Other
4 stars 0 forks source link

Colons in imageIds causing filepath issues when downloading #131

Closed nathanielrindlaub closed 1 year ago

nathanielrindlaub commented 1 year ago

@ingalls - as you'll recall we started using <projectId>:<imageHash> as our template for image Ids, and I am running into issues when I try to download images from Animl to my local machine using this script.

The images will download, but the colons in the destination filenames get replaced with forward slashes at this line.

So the file will end up getting downloaded to sci_biosecurity/864839042280038/island_packers_ventura/sci_biosecurity/4e801b493beb8dfaadffa77cc8d88a15.jpg instead of sci_biosecurity/864839042280038/island_packers_ventura/sci_biosecurity:4e801b493beb8dfaadffa77cc8d88a15.jpg.

I tried tracing it through the boto3 download_file function, but I couldn't Id where the filename might be getting modified.

But then I did some googling and discovered that colons are reserved characters in the Windows filesystem (see wikipedia page). I'm not sure that explains the behavior I'm seeing because I'm using a Mac, but it's concerning non-the-less. Thoughts?

ingalls commented 1 year ago

@nathanielrindlaub Two parts

Ref: https://github.com/tnc-ca-geo/animl-analytics/pull/4

nathanielrindlaub commented 1 year ago

Gotcha yeah I am a little worries about this being a bit of a bandaid, but I don't have a solid idea of what downstream workflows may suffer from having that inconsistency between image Id and filepath pattern, so I guess let's run with it and hope it doesn't causes issues down the road.

I think the fix should be in animl-api's COCO for Cameratraps JSON export code though, rather than the animl-analytics script. I'll push that up now.