Closed scottyhq closed 1 week ago
Thanks for raising this @scottyhq! I don't think anyone has actually tried to open a tiff file with virtualizarr before!
Running your example, the output of kerchunk.tiff.tiff_to_zarr(url)
looks like
{
'.zgroup': '{\n "zarr_format": 2\n}',
'.zattrs': '{"multiscales":[{"datasets":[{"path":"0"},{"path":"1"},{"path":"2"}],"metadata":{},"name":"","version":"0.1"}],"OVR_RESAMPLING_ALG":"NEAREST","LAYOUT":"IFDS_BEFORE_DATA","BLOCK_ORDER":"ROW_MAJOR","BLOCK_LEADER":"SIZE_AS_UINT4","BLOCK_TRAILER":"LAST_4_BYTES_REPEATED","KNOWN_INCOMPATIBLE_EDITION":"NO","KeyDirectoryVersion":1,"KeyRevision":1,"KeyRevisionMinor":0,"GTModelTypeGeoKey":1,"GTRasterTypeGeoKey":1,"GTCitationGeoKey":"Albers","GeographicTypeGeoKey":4326,"GeogCitationGeoKey":"WGS 84","GeogAngularUnitsGeoKey":9102,"GeogSemiMajorAxisGeoKey":6378140.0,"GeogInvFlatteningGeoKey":298.256999999996,"ProjectedCSTypeGeoKey":32767,"ProjectionGeoKey":32767,"ProjCoordTransGeoKey":11,"ProjLinearUnitsGeoKey":9001,"ProjStdParallel1GeoKey":29.5,"ProjStdParallel2GeoKey":45.5,"ProjNatOriginLongGeoKey":-96.0,"ProjNatOriginLatGeoKey":23.0,"ProjFalseEastingGeoKey":0.0,"ProjFalseNorthingGeoKey":0.0,"ModelPixelScale":[30.0,30.0,0.0],"ModelTiepoint":[0.0,0.0,0.0,-1801185.0,2700405.0,0.0]}',
'0/.zattrs': '{\n "_ARRAY_DIMENSIONS": [\n "Y",\n "X"\n ]\n}',
'0/.zarray': '{\n "chunks": [\n 512,\n 512\n ],\n "compressor": {\n "id": "zlib"\n },\n "dtype": "|u1",\n "fill_value": 0,\n "filters": null,\n "order": "C",\n "shape": [\n 2048,\n 2048\n ],\n "zarr_format": 2\n}',
...,
}
It looks like this is not the same structure that e.g. kerchunk.hdf.SingleHdf5ToZarr
returns.
What virtualizarr is expecting (and what the kerchunk docs promise...) is that the keys of the outermost dictionary are 'refs'
and 'version'
. This kerchunk.tiff.tiff_to_zarr(url)
function seems to have jumped straight to giving us the contents that would normally be underneath the 'refs'
key.
We could either fix this upstream in kerchunk, or just work around it here by special-casing tiffs to add that top-level {'refs': ...}
ourselves. I vote for the latter.
Makes sense @TomNicholas, seems like if someone is motivated an upstream fix is the right approach. I just came across this working on https://github.com/zarr-developers/VirtualiZarr/pull/143 so figured I'd document it. For what it's worth the it's the same situation currently with FITS:
url = 'https://fits.gsfc.nasa.gov/samples/WFPC2u5780205r_c0fx.fits'
kerchunk.fits.process_file(url)
Yeah thanks for documenting it!
We should raise an issue upstream to report it, but as long as there are no other differences in the structure then working around it here should be very simple.