sertit / eoreader

Remote-sensing opensource python library reading optical and SAR sensors, loading and stacking bands, clouds, DEM and spectral indices in a sensor-agnostic way.
https://eoreader.readthedocs.io/en/latest/
Apache License 2.0
271 stars 22 forks source link

Access to image native dimensions #128

Open gpo-geo opened 5 months ago

gpo-geo commented 5 months ago

Is your feature request related to a problem? Please describe.

When loading a band, I can't find a way to get the native full-size dimensions before actually loading it.

Describe the solution you'd like

I would like an API to query the band dimensions, so that I can use a tiling strategy to load the data piece by piece (using the rasterio Window object for instance).

Describe alternatives you've considered

Maybe a function like Product.get_band_shape(resolution: float = None) so the user can estimate the buffer size he will get, depending on the resolution chosen. If resolution = None, use the native resolution.

Additional context

remi-braun commented 5 months ago

Hello,

Your feature would be nice to have! The question I have is : what to do with products which need orthorectification or reprojection? Do you want to orthorectify/reproject the band and the return their shape ?

If it's the case, you can query the band path with prod.get_band_paths and then open this file with rasterio to get the shape.

Something like this should work:

path = "S2A_MSIL1C_20220130T073141_N0400_R049_T36KYG_20220130T092334.SAFE"
prod = Reader().open(path)
band_paths = prod.get_band_paths([RED, GREEN, BLUE])
with rasterio.open(band_paths[RED]) as ds:
    red_shape = ds.shape
>>> red_shape
(10980, 10980)

⚠️ It works only with native resolution for band in correct CRS (rasterio will open the raw band) For reprojected/ortho bands, you can ask for any destination pixel size you want

gpo-geo commented 5 months ago

In my case, I don't need to orthorectify or reproject the product. I want to chunk it in 256x256, let say for a custom algorithm, to reduce memory usage or for parallization. However, I feel that improving Dask support would do just that: Product.load(band, chunks=[256, 256]), returning a DaskArray, with the data lazy loaded.

To answer your proposition of using rasterio, I was not sure it would work in all cases (zipped archives, netcdf, ...) but it seems that EOREADER always provide a direct posix path to the uncompressed image file. Am I right ?

remi-braun commented 5 months ago

Beware, Dask support is still on an early stage, I cannot guarantee that load will do everything lazily. 😅 And for sure, it will reproject or orthorectify images that needs to be (SAR, WGS84 stacks...)

The issue is that EOReader will always want to work with UTM bands, so everything has been coded to give access to these bands. If they already exists, then no problem everything will be seamless. But if they don't exist, their computation is done no matter what, and this takes time. This will depend on your type of input data.

The path returned by prod.get_band_paths is always ingestible by rasterio, so don't bother on that (inside a zip, tar, through S3, or to tif files) The only managed NETCDF-based product is Sentinel-3 and this product is automatically geocoded before use, so the band you'll have will not be the raw netcdf but a tif.

gpo-geo commented 5 months ago

Thanks for your explanations. So if I understand correctly:

gpo-geo commented 5 months ago

Sorry to bother, the FAQ already answers these points

remi-braun commented 5 months ago

No problem!