Open Koncopd opened 2 months ago
We first need to understand the difference between md5Hash and ETag. On AWS, for files below 50MB, the ETag is the md5 hash in hex representation.
Here, it seems that there is a difference, and google use base64 representation.
And yes: we need to store the hash for file-like artifacts on GCP.
For gcp it is described here https://cloud.google.com/storage/docs/hashes-etags
Oh, that's very interesting. I fear though that AWS doesn't support CRC32c. It looks better in every regard than md5...
Let's keep this issue open and see what we can do here in the future.
For the time being, we'd likely resort to md5.
UPath.stat
for google cloud paths has both"etag"
and"md5Hash"
. We need to add this to https://github.com/laminlabs/lamindb-setup/blob/6bda3d8bc6c47c7707a79554149e8dc6a534e40f/lamindb_setup/core/upath.py#L745 There are some processing of these hashes, so i am not sure how to this correctly as i am not aware of why this processing even needed, didn't work with hashes. Now we just ignore hashes for individual files in gcp, but not for folders, which is strange.