opendatacube / datacube-core

Open Data Cube analyses continental scale Earth Observation data through time
http://www.opendatacube.org
Apache License 2.0
493 stars 175 forks source link

Error when loading a virtual product when one of the sensors has no data for the query. #1408

Closed vnewey closed 1 year ago

vnewey commented 1 year ago

Expected behaviour

If you load a virtual product where one of the sensors has no data for the query, it should return the data from the other sensors.

Actual behaviour

An error occurs when you try to load a virtual product which includes one or more sensors that have no scenes for the date range you are using. For example: If you load a virtual product with a query time range of 2018 -2020 and you have Landsat 5 included in your virtual product, it will throw an error.

Steps to reproduce the behaviour

I am using datacube version 1.8.11 on the new Sandbox version 2.0.0 This Virtual products catalog: https://github.com/GeoscienceAustralia/dea-intertidal/blob/main/configs/dea_virtual_product_landsat_s2.yaml code to get the error:

import datacube
from datacube.virtual import catalog_from_file
dc = datacube.Datacube(app='Virtual_products')
catalog = catalog_from_file('dea_virtual_product_landsat_s2.yaml')query = {​​​​​​​​​​​​'x': (153.38, 153.47), 'y': (-28.83, -28.92), 'time': ('2018-01', '2018-02')}​​​​​​​​​​​​
product = catalog['ls_nbart_ndwi']
ls_ds = product.load(dc, **query)

Error message:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 ls_ds = product.load(dc, **query)
File /env/lib/python3.8/site-packages/datacube/virtual/impl.py:309, in VirtualProduct.load(self, dc, **query)
    307 """ Mimic `datacube.Datacube.load`. For illustrative purposes. May be removed in the future. """
    308 datasets = self.query(dc, **query)
--> 309 grouped = self.group(datasets, **query)
    310 return self.fetch(grouped, **query)
File /env/lib/python3.8/site-packages/datacube/virtual/impl.py:467, in Transform.group(self, datasets, **group_settings)
    466 def group(self, datasets: VirtualDatasetBag, **group_settings: Dict[str, Any]) -> VirtualDatasetBox:
--> 467     return self._input.group(datasets, **group_settings)
File /env/lib/python3.8/site-packages/datacube/virtual/impl.py:467, in Transform.group(self, datasets, **group_settings)
    466 def group(self, datasets: VirtualDatasetBag, **group_settings: Dict[str, Any]) -> VirtualDatasetBox:
--> 467     return self._input.group(datasets, **group_settings)
File /env/lib/python3.8/site-packages/datacube/virtual/impl.py:467, in Transform.group(self, datasets, **group_settings)
    466 def group(self, datasets: VirtualDatasetBag, **group_settings: Dict[str, Any]) -> VirtualDatasetBox:
--> 467     return self._input.group(datasets, **group_settings)
File /env/lib/python3.8/site-packages/datacube/virtual/impl.py:613, in Collate.group(self, datasets, **group_settings)
    608 groups = [build(source_index, product, dataset_bag)
    609           for source_index, (product, dataset_bag)
    610           in enumerate(zip(self._children, datasets.bag['collate']))]
    612 dim = self.get('dim', 'time')
--> 613 return VirtualDatasetBox(xarray.concat([grouped.box for grouped in groups], dim=dim).sortby(dim),
    614                          select_unique([grouped.geobox for grouped in groups]),
    615                          select_unique([grouped.load_natively for grouped in groups]),
    616                          merge_dicts([grouped.product_definitions for grouped in groups]),
    617                          geopolygon=select_unique([grouped.geopolygon for grouped in groups]))
File /env/lib/python3.8/site-packages/xarray/core/concat.py:236, in concat(objs, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs)
    231     raise ValueError(
    232         f"compat={compat!r} invalid: must be 'broadcast_equals', 'equals', 'identical', 'no_conflicts' or 'override'"
    233     )
    235 if isinstance(first_obj, DataArray):
--> 236     return _dataarray_concat(
    237         objs,
    238         dim=dim,
    239         data_vars=data_vars,
    240         coords=coords,
    241         compat=compat,
    242         positions=positions,
    243         fill_value=fill_value,
    244         join=join,
    245         combine_attrs=combine_attrs,
    246     )
    247 elif isinstance(first_obj, Dataset):
    248     return _dataset_concat(
    249         objs,
    250         dim=dim,
   (...)
    257         combine_attrs=combine_attrs,
    258     )
File /env/lib/python3.8/site-packages/xarray/core/concat.py:662, in _dataarray_concat(arrays, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs)
    659             arr = cast(T_DataArray, arr.rename(name))
    660     datasets.append(arr._to_temp_dataset())
--> 662 ds = _dataset_concat(
    663     datasets,
    664     dim,
    665     data_vars,
    666     coords,
    667     compat,
    668     positions,
    669     fill_value=fill_value,
    670     join=join,
    671     combine_attrs=combine_attrs,
    672 )
    674 merged_attrs = merge_attrs([da.attrs for da in arrays], combine_attrs)
    676 result = arrays[0]._from_temp_dataset(ds, name)
File /env/lib/python3.8/site-packages/xarray/core/concat.py:574, in _dataset_concat(datasets, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs)
    569 if len(indexes) < len(datasets):
    570     raise ValueError(
    571         f"{name!r} must have either an index or no index in all datasets, "
    572         f"found {len(indexes)}/{len(datasets)} datasets with an index."
    573     )
--> 574 combined_idx = indexes[0].concat(indexes, dim, positions)
    575 if name in datasets[0]._indexes:
    576     idx_vars = datasets[0].xindexes.get_all_coords(name)
File /env/lib/python3.8/site-packages/xarray/core/indexes.py:387, in PandasIndex.concat(cls, indexes, dim, positions)
    385     coord_dtype = None
    386 else:
--> 387     coord_dtype = np.result_type(*[idx.coord_dtype for idx in indexes])
    389 return cls(new_pd_index, dim=dim, coord_dtype=coord_dtype)
File <__array_function__ internals>:5, in result_type(*args, **kwargs)
TypeError: The DType <class 'numpy.dtype[float64]'> could not be promoted by <class 'numpy.dtype[datetime64]'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is `object`. The full list of DTypes is: (<class 'numpy.dtype[float64]'>, <class 'numpy.dtype[datetime64]'>, <class 'numpy.dtype[datetime64]'>)

Environment information

I am using datacube version 1.8.11 on the new Sandbox version 2.0.0

Notes

This problem does not seem to impact dc.load or the load_ard function from

Note: Stale issues will be automatically closed after a period of six months with no activity. To ensure critical issues are not closed, tag them with the Github pinned tag. If you are a community member and not a maintainer please escalate this issue to maintainers via GIS StackExchange or Slack.

robbibt commented 1 year ago

This same issue has also been reported by an external user on Slack: https://opendatacube.slack.com/archives/CPJ1LV308/p1676957704786709

Hi All, I'm trying to load a virtual product (last worked Dec 2022) that is now having issues with the input of the time on the query here is the load

# load woody_cover vp
product = catalog['woody_cover']
woody_cover_vp = product.load(dc, **query_pre)
here is query_pre
{'time': ('2000', '2000'),
 'resolution': (-30, 30),
 'geopolygon': Geometry(POLYGON ((145.79436260373194 -16.90696000168792, 145.79159203871052 -16.909913745382553, 145.78929642769026 -16.914912283096314, 145.78628838566425 -16.920895206901207, 145.78367613864327 -16.92733231772621, 145.78256791263254 -16.932709029831187, 145.78233043563068 -16.939070011978885, 145.78573427265894 -16.946263720709496, 145.7917503567109 -16.94891396508568, 145.7952333527407 -16.952321367247194, 145.7966582147522 -16.956183015024465, 145.79816223576518 -16.95550155353554, 145.79958709777662 -16.95376002961467, 145.80124943679266 -16.95277568286842, 145.8034658888114 -16.94891396508568, 145.80402000181414 -16.946339442495372, 145.807423838845 -16.94346199318062, 145.8075821568454 -16.94050877573494, 145.81019440386643 -16.937252610451495, 145.81098599387371 -16.931345934253656, 145.81138178887863 -16.925363342481972, 145.81106515287524 -16.923924462937265, 145.81161926588055 -16.923545808597467, 145.8114609478775 -16.922409841009625, 145.81193590188133 -16.91847176699173, 145.8120942198844 -16.915139486205263, 145.81098599387371 -16.91203435333327, 145.81161926588055 -16.90779311357052, 145.808927859858 -16.908247536681486, 145.8049699098243 -16.90779311357052, 145.80251598080116 -16.908853432459082, 145.79792475876326 -16.908777695592605, 145.79436260373194 -16.90696000168792)), epsg:4326)}

when I remove the time in the query dict then the virtual product loads, however time included on query_pre I get the follow error. I know there are some warnings about shapely 2.0, and this may be the cause as I've tried in on another odc instance (datacube 1.8.8) with shapely 1.8.4 and I don't have any troubles. Any help would be appreciated!

/env/lib/python3.8/site-packages/datacube/utils/geometry/_base.py:626: ShapelyDeprecationWarning: The 'type' attribute is deprecated, and will be removed in the future. You can use the 'geom_type' attribute instead.
  if geom.type in ['Point', 'MultiPoint']:
/env/lib/python3.8/site-packages/datacube/utils/geometry/_base.py:629: ShapelyDeprecationWarning: The 'type' attribute is deprecated, and will be removed in the future. You can use the 'geom_type' attribute instead.
  if geom.type in ['GeometryCollection', 'MultiPolygon', 'MultiLineString']:
/env/lib/python3.8/site-packages/datacube/utils/geometry/_base.py:632: ShapelyDeprecationWarning: The 'type' attribute is deprecated, and will be removed in the future. You can use the 'geom_type' attribute instead.
  if geom.type in ['LineString', 'LinearRing']:
/env/lib/python3.8/site-packages/datacube/utils/geometry/_base.py:635: ShapelyDeprecationWarning: The 'type' attribute is deprecated, and will be removed in the future. You can use the 'geom_type' attribute instead.
  if geom.type == 'Polygon':
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[15], line 3
      1 # load woody_cover vp
      2 product = catalog['woody_cover']
----> 3 woody_cover_vp = product.load(dc, **query_pre)
      5 # # woody cover with threshold for saltmarsh #### guestimate thresholds at the moment #####
      6 # woody_cover = xr.where((woody_cover_vp.woody_cover > 0.05) & (woody_cover_vp.woody_cover < 0.4), 1, 0)#.astype('int8')
      7 # woody_cover.attrs['crs'] = 'EPSG:3577'

File /env/lib/python3.8/site-packages/datacube/virtual/impl.py:309, in VirtualProduct.load(self, dc, **query)
    307 """ Mimic `datacube.Datacube.load`. For illustrative purposes. May be removed in the future. """
    308 datasets = self.query(dc, **query)
--> 309 grouped = self.group(datasets, **query)
    310 return self.fetch(grouped, **query)

File /env/lib/python3.8/site-packages/datacube/virtual/impl.py:467, in Transform.group(self, datasets, **group_settings)
    466 def group(self, datasets: VirtualDatasetBag, **group_settings: Dict[str, Any]) -> VirtualDatasetBox:
--> 467     return self._input.group(datasets, **group_settings)

File /env/lib/python3.8/site-packages/datacube/virtual/impl.py:517, in Aggregate.group(self, datasets, **group_settings)
    516 def group(self, datasets: VirtualDatasetBag, **group_settings: Dict[str, Any]) -> VirtualDatasetBox:
--> 517     grouped = self._input.group(datasets, **group_settings)
    518     dim = self.get('dim', 'time')
    520     def to_box(value):

File /env/lib/python3.8/site-packages/datacube/virtual/impl.py:613, in Collate.group(self, datasets, **group_settings)
    608 groups = [build(source_index, product, dataset_bag)
    609           for source_index, (product, dataset_bag)
    610           in enumerate(zip(self._children, datasets.bag['collate']))]
    612 dim = self.get('dim', 'time')
--> 613 return VirtualDatasetBox(xarray.concat([grouped.box for grouped in groups], dim=dim).sortby(dim),
    614                          select_unique([grouped.geobox for grouped in groups]),
    615                          select_unique([grouped.load_natively for grouped in groups]),
    616                          merge_dicts([grouped.product_definitions for grouped in groups]),
    617                          geopolygon=select_unique([grouped.geopolygon for grouped in groups]))

File /env/lib/python3.8/site-packages/xarray/core/concat.py:236, in concat(objs, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs)
    231     raise ValueError(
    232         f"compat={compat!r} invalid: must be 'broadcast_equals', 'equals', 'identical', 'no_conflicts' or 'override'"
    233     )
    235 if isinstance(first_obj, DataArray):
--> 236     return _dataarray_concat(
    237         objs,
    238         dim=dim,
    239         data_vars=data_vars,
    240         coords=coords,
    241         compat=compat,
    242         positions=positions,
    243         fill_value=fill_value,
    244         join=join,
    245         combine_attrs=combine_attrs,
    246     )
    247 elif isinstance(first_obj, Dataset):
    248     return _dataset_concat(
    249         objs,
    250         dim=dim,
   (...)
    257         combine_attrs=combine_attrs,
    258     )

File /env/lib/python3.8/site-packages/xarray/core/concat.py:662, in _dataarray_concat(arrays, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs)
    659             arr = cast(T_DataArray, arr.rename(name))
    660     datasets.append(arr._to_temp_dataset())
--> 662 ds = _dataset_concat(
    663     datasets,
    664     dim,
    665     data_vars,
    666     coords,
    667     compat,
    668     positions,
    669     fill_value=fill_value,
    670     join=join,
    671     combine_attrs=combine_attrs,
    672 )
    674 merged_attrs = merge_attrs([da.attrs for da in arrays], combine_attrs)
    676 result = arrays[0]._from_temp_dataset(ds, name)

File /env/lib/python3.8/site-packages/xarray/core/concat.py:574, in _dataset_concat(datasets, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs)
    569 if len(indexes) < len(datasets):
    570     raise ValueError(
    571         f"{name!r} must have either an index or no index in all datasets, "
    572         f"found {len(indexes)}/{len(datasets)} datasets with an index."
    573     )
--> 574 combined_idx = indexes[0].concat(indexes, dim, positions)
    575 if name in datasets[0]._indexes:
    576     idx_vars = datasets[0].xindexes.get_all_coords(name)

File /env/lib/python3.8/site-packages/xarray/core/indexes.py:387, in PandasIndex.concat(cls, indexes, dim, positions)
    385     coord_dtype = None
    386 else:
--> 387     coord_dtype = np.result_type(*[idx.coord_dtype for idx in indexes])
    389 return cls(new_pd_index, dim=dim, coord_dtype=coord_dtype)

File <__array_function__ internals>:180, in result_type(*args, **kwargs)

TypeError: The DType <class 'numpy.dtype[datetime64]'> could not be promoted by <class 'numpy.dtype[float64]'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is `object`. The full list of DTypes is: (<class 'numpy.dtype[float64]'>, <class 'numpy.dtype[float64]'>, <class 'numpy.dtype[datetime64]'>)
SpacemanPaul commented 1 year ago

Note in case someone other than me ends up looking at this. There are several scary-looking deprecation warnings coming out of the virtual product tests - please try and fix these at the same time.