intake / intake-xarray

Intake plugin for xarray
https://intake-xarray.readthedocs.io/
BSD 2-Clause "Simplified" License
74 stars 36 forks source link

set automatically coerce_shape for xarray_image #104

Open acocac opened 3 years ago

acocac commented 3 years ago

I have a local directory with GeoTIFF files with different shapes. I've explored the coerce_shape parameter to define manually a certain shape. I'm wondered if there's a workaround to coerce all images according to the largest shape in the directory instead of defining it manually. The following lines show how I define the catalog:

[...]
sources:
  test1:
    driver: xarray_image
    args:
      urlpath: '{{ CATALOG_DIR }}/data/*.tif'
      coerce_shape: [400,400]
[...]
martindurant commented 3 years ago

Intake is not currently able to automatically investigate a set of data sources to derive a value for using in further data sources.

Two possible future routes that could implement the idea:

acocac commented 3 years ago

Thanks @martindurant for pointing the possible future routes. Both are valid for me.

When you say a new catalog able to instrospect a set of data, do you have any specific example?

It would be great to implement a lazy operation to retrieve image size e.g. PIL's Image.open (see here). However, I am not sure how effective this operation might result for a catalog with million of images.

martindurant commented 3 years ago

When you say a new catalog able to instrospect a set of data

Not really, this would be a new model. Catalogues have access to their child data sources of course, but it is not the normal pattern to try to access their internal metadata. As you say, this might be expensive. There are, however, lazy catalogues, where entries (the objects that make sources) are only created on request.