seung-lab / cloud-files

Threaded Python and CLI client library for AWS S3, Google Cloud Storage (GCS), in-memory, and the local filesystem.
BSD 3-Clause "New" or "Revised" License
36 stars 8 forks source link

File hash check for local fs and https #76

Open madiganz opened 2 years ago

madiganz commented 2 years ago

Not sure if this is the place for asking questions, but I see that for network robustness, when transferring a file, the hash is checked to make sure the destination matches the source for GCS and S3. Is there a reason why this check isn't also done for local file system and https transfers?

william-silversmith commented 2 years ago

This is certainly a place for asking questions! The only reason these checks aren't done is:

a) For local files, there's no hash metadata to compare to. We could add a metadata file, but it would be an extra cost. This library was designed in a context where hundreds of millions of files could be generated and that was already wrecking filesystems, so adding another file would have doubled the load. We could add information somehow as an option. Did I understand your question correctly?

b) For HTTPS transfers, the main reason is that I haven't really looked into it. Is there a standard most web servers follow?

madiganz commented 2 years ago

Yes that answers my question and makes sense for local file systems. I am not sure if there is a standard that most web servers follow, so I think sounds like not checking for now is the way to go

william-silversmith commented 2 years ago

Yea, I'm also happy to build in some special handling for important systems, but I think they need to be identified to me first.