weiji14 / cog3pio

Cloud-optimized GeoTIFF ... Parallel I/O 🦀
Apache License 2.0
29 stars 0 forks source link

Using cog3pio to determine byte ranges in COG files? #16

Open TomNicholas opened 5 months ago

TomNicholas commented 5 months ago

Basically same question as https://github.com/gauteh/hidefix/issues/38#issue-2236324824 but for this library 😁

I'm building VirtualiZarr, an evolution of kerchunk, that allows you to determine byte ranges of chunks in netCDF files, but then concatenate the virtual representation of those chunks using xarray's API.

This works by creating a ChunkManifest object in-memory (one per netCDF Variable per file initially), then defining ways to merge those manifests.

What I'm wondering is if cog3pio's code could be useful to me as a way to generate the ChunkManifest for a netCDF file without using kerchunk/fsspec (see this issue). In other words I use cog3pio only to determine the byte ranges, not for actually reading the data. (I plan to actually read the bytes later as if it were zarr using the rust object-store crate, see https://github.com/zarr-developers/zarr-python/pull/1661).

Q's:

cc @norlandrhagen

weiji14 commented 5 months ago

Oh hi Tom, I was just chatting to @norlandrhagen at the Pangeo weekly meeting :laughing:. I will say that yes, it should be possible to work out the byte ranges from just the GeoTIFF's header, and we can expose that via an API function somehow. I just need to figure out how GDAL does this, and re-implement it here (easier said than done).

Note that I'm already using object-store in cog3pio (https://github.com/weiji14/cog3pio/pull/5), and passing a HTTP url to a GeoTIFF should already work. Reading from s3 (or azure, gcp, etc) will work too if I enable the feature flag here and recompile:

https://github.com/weiji14/cog3pio/blob/33285da1280f165d623457c73c1ae55c1674472e/Cargo.toml#L17

I'm aware that the Zarr v3 implementation is using object-store, and keeping an eye on progress at https://github.com/roeap/object-store-python. Would definitely be keen to standardize on object-store as the 'fsspec-for-Rust'.

TomNicholas commented 5 months ago

Awesome! Thanks @weiji14

Would definitely be keen to standardize on object-store as the 'fsspec-for-Rust'.

Yeah this would be great, and I like the way you've described the aim there.

martindurant commented 4 months ago

I would add, that there's no reason that kerchunk needs to use fsspec - it's really a package of ways to make reference files. Therefore, if you come up with a way to get COG offsets, it can easily live there together with the other reference makers. Does TIFFFile do the required work too?