Open jasper-tms opened 1 month ago
Oh wow, I can reproduce this error on the download side. I'll have to investigate. This is pretty weird.
I ran into this issue when writing jpg/png images with CloudVolume and had gzip compression explicitly turned on (overriding CloudVolume default for jpg/png).
Maybe same here? Jasper's images are jpg, and the remote objects have a custom header X-Goog-Stored-Content-Encoding: gzip
..
Yes, that's right, thanks Nico – I have a little library for format conversion npimage to which I just added support for saving as neuroglancer precomputed via cloudvolume, and if the user asks for npimage.save(array, 'gs://bucket/path', compress=True)
(or compress='lossy'
) then I do give them both jpeg and gzip:
I thought this was the default behavior of cloudvolume but perhaps its not and so I've ended up in a rarely used corner case? Are jpeg encoded precomputed volumes not gzipped typically?
Correct, for compressed image file formats such as JPG and PNG, gzip should not be necessary, because it's already part of the format, anyway. (PNG uses the exact same compression algorithm as GZIP, which is called DEFLATE; and JPG uses Huffman coding, which again is part of DEFLATE). CloudVolume determines whether or not gzip compression is required, here: https://github.com/seung-lab/cloud-volume/blob/master/cloudvolume/datasource/precomputed/common.py#L12-L19
Still, somewhere CloudFiles seems to compare gzipped with ungzipped checksums for these "double compressed" files?
Great, I skipped gzipping when in jpeg encoding and that got rid of the checksum issues from cloudfiles. Thanks for the input Nico.
If it really doesn't make sense to ever gzip when in jpeg encoding, you might think of enforcing that on the cloudvolume side Will. (There still remains the question of why cloudfiles is getting confused about the checksums for these double compressed files, but if you update cloudvolume to refused to make such files, the problem is probably 90% solved, in practice.)
Feel free to close this issue or not depending on whether you think you'll try to dive in and fix the checksum bug.
I wonder if this is a bug in Google's library? I played around with this and it seems like blob.download_as_bytes(start=start, end=end, raw_download=True, checksum=None)
is not respecting raw_download=True
? The string returned is not encoded as gzip, so it must have been decompressed already and so the md5 match will fail.
Hi Will,
I used cloudvolume to upload a simple greyscale image volume in precomputed format to google cloud, as I've done a million times. The upload seemed to succeed without issue. But if I try to download the data from google cloud using cloudvolume, I get a scary error:
I've never seen this before. I tried re-uploading the dataset and got the same problem, so I don't think it was a failed upload / corrupted data. The dataset also loads into neuroglancer just fine. I can also download the files using a
gcloud storage cp
command just fine. So I suspect that the issue may not actually be with the files but with howcloudfiles
is attempting to validate the checksum. Not sure if its relevant, but the specific cube that triggers the error is in fact all black (pixel values all 0) and is the top-left-most block in the dataset.Do you have any idea what could be going on here? Can you reproduce the issue if you try to load this exact volume into memory via cloudvolume?
Thanks a lot!