seung-lab / cloud-files

Threaded Python and CLI client library for AWS S3, Google Cloud Storage (GCS), in-memory, and the local filesystem.
BSD 3-Clause "New" or "Revised" License
38 stars 8 forks source link

MD5 Checks #18

Closed william-silversmith closed 4 years ago

william-silversmith commented 4 years ago

Sometimes when fill_missing is enabled and a data server is under load, we'll get mysterious blank tiles. I suspect this is due to a 200 response status and a blank data body.

Comparing checksums (or even 'Content-Length') would prevent this, but it's possible for the Content-Length field to be computed on the fly by the server based on the data it's about to send, so a checksum is much more robust.

william-silversmith commented 4 years ago

md5 is built into pure python via the hashlib module. The md5 checks should be performed "if available" as many datasets and files will not have an md5 computed, especially those stored on the filesystem.

william-silversmith commented 4 years ago

Addressed in https://github.com/seung-lab/cloud-files/pull/16