Closed vnewey closed 1 year ago
This same issue has also been reported by an external user on Slack: https://opendatacube.slack.com/archives/CPJ1LV308/p1676957704786709
Hi All, I'm trying to load a virtual product (last worked Dec 2022) that is now having issues with the input of the time on the query here is the load
# load woody_cover vp
product = catalog['woody_cover']
woody_cover_vp = product.load(dc, **query_pre)
here is query_pre
{'time': ('2000', '2000'),
'resolution': (-30, 30),
'geopolygon': Geometry(POLYGON ((145.79436260373194 -16.90696000168792, 145.79159203871052 -16.909913745382553, 145.78929642769026 -16.914912283096314, 145.78628838566425 -16.920895206901207, 145.78367613864327 -16.92733231772621, 145.78256791263254 -16.932709029831187, 145.78233043563068 -16.939070011978885, 145.78573427265894 -16.946263720709496, 145.7917503567109 -16.94891396508568, 145.7952333527407 -16.952321367247194, 145.7966582147522 -16.956183015024465, 145.79816223576518 -16.95550155353554, 145.79958709777662 -16.95376002961467, 145.80124943679266 -16.95277568286842, 145.8034658888114 -16.94891396508568, 145.80402000181414 -16.946339442495372, 145.807423838845 -16.94346199318062, 145.8075821568454 -16.94050877573494, 145.81019440386643 -16.937252610451495, 145.81098599387371 -16.931345934253656, 145.81138178887863 -16.925363342481972, 145.81106515287524 -16.923924462937265, 145.81161926588055 -16.923545808597467, 145.8114609478775 -16.922409841009625, 145.81193590188133 -16.91847176699173, 145.8120942198844 -16.915139486205263, 145.81098599387371 -16.91203435333327, 145.81161926588055 -16.90779311357052, 145.808927859858 -16.908247536681486, 145.8049699098243 -16.90779311357052, 145.80251598080116 -16.908853432459082, 145.79792475876326 -16.908777695592605, 145.79436260373194 -16.90696000168792)), epsg:4326)}
when I remove the time in the query dict then the virtual product loads, however time included on query_pre I get the follow error. I know there are some warnings about shapely 2.0, and this may be the cause as I've tried in on another odc instance (datacube 1.8.8) with shapely 1.8.4 and I don't have any troubles. Any help would be appreciated!
/env/lib/python3.8/site-packages/datacube/utils/geometry/_base.py:626: ShapelyDeprecationWarning: The 'type' attribute is deprecated, and will be removed in the future. You can use the 'geom_type' attribute instead.
if geom.type in ['Point', 'MultiPoint']:
/env/lib/python3.8/site-packages/datacube/utils/geometry/_base.py:629: ShapelyDeprecationWarning: The 'type' attribute is deprecated, and will be removed in the future. You can use the 'geom_type' attribute instead.
if geom.type in ['GeometryCollection', 'MultiPolygon', 'MultiLineString']:
/env/lib/python3.8/site-packages/datacube/utils/geometry/_base.py:632: ShapelyDeprecationWarning: The 'type' attribute is deprecated, and will be removed in the future. You can use the 'geom_type' attribute instead.
if geom.type in ['LineString', 'LinearRing']:
/env/lib/python3.8/site-packages/datacube/utils/geometry/_base.py:635: ShapelyDeprecationWarning: The 'type' attribute is deprecated, and will be removed in the future. You can use the 'geom_type' attribute instead.
if geom.type == 'Polygon':
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[15], line 3
1 # load woody_cover vp
2 product = catalog['woody_cover']
----> 3 woody_cover_vp = product.load(dc, **query_pre)
5 # # woody cover with threshold for saltmarsh #### guestimate thresholds at the moment #####
6 # woody_cover = xr.where((woody_cover_vp.woody_cover > 0.05) & (woody_cover_vp.woody_cover < 0.4), 1, 0)#.astype('int8')
7 # woody_cover.attrs['crs'] = 'EPSG:3577'
File /env/lib/python3.8/site-packages/datacube/virtual/impl.py:309, in VirtualProduct.load(self, dc, **query)
307 """ Mimic `datacube.Datacube.load`. For illustrative purposes. May be removed in the future. """
308 datasets = self.query(dc, **query)
--> 309 grouped = self.group(datasets, **query)
310 return self.fetch(grouped, **query)
File /env/lib/python3.8/site-packages/datacube/virtual/impl.py:467, in Transform.group(self, datasets, **group_settings)
466 def group(self, datasets: VirtualDatasetBag, **group_settings: Dict[str, Any]) -> VirtualDatasetBox:
--> 467 return self._input.group(datasets, **group_settings)
File /env/lib/python3.8/site-packages/datacube/virtual/impl.py:517, in Aggregate.group(self, datasets, **group_settings)
516 def group(self, datasets: VirtualDatasetBag, **group_settings: Dict[str, Any]) -> VirtualDatasetBox:
--> 517 grouped = self._input.group(datasets, **group_settings)
518 dim = self.get('dim', 'time')
520 def to_box(value):
File /env/lib/python3.8/site-packages/datacube/virtual/impl.py:613, in Collate.group(self, datasets, **group_settings)
608 groups = [build(source_index, product, dataset_bag)
609 for source_index, (product, dataset_bag)
610 in enumerate(zip(self._children, datasets.bag['collate']))]
612 dim = self.get('dim', 'time')
--> 613 return VirtualDatasetBox(xarray.concat([grouped.box for grouped in groups], dim=dim).sortby(dim),
614 select_unique([grouped.geobox for grouped in groups]),
615 select_unique([grouped.load_natively for grouped in groups]),
616 merge_dicts([grouped.product_definitions for grouped in groups]),
617 geopolygon=select_unique([grouped.geopolygon for grouped in groups]))
File /env/lib/python3.8/site-packages/xarray/core/concat.py:236, in concat(objs, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs)
231 raise ValueError(
232 f"compat={compat!r} invalid: must be 'broadcast_equals', 'equals', 'identical', 'no_conflicts' or 'override'"
233 )
235 if isinstance(first_obj, DataArray):
--> 236 return _dataarray_concat(
237 objs,
238 dim=dim,
239 data_vars=data_vars,
240 coords=coords,
241 compat=compat,
242 positions=positions,
243 fill_value=fill_value,
244 join=join,
245 combine_attrs=combine_attrs,
246 )
247 elif isinstance(first_obj, Dataset):
248 return _dataset_concat(
249 objs,
250 dim=dim,
(...)
257 combine_attrs=combine_attrs,
258 )
File /env/lib/python3.8/site-packages/xarray/core/concat.py:662, in _dataarray_concat(arrays, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs)
659 arr = cast(T_DataArray, arr.rename(name))
660 datasets.append(arr._to_temp_dataset())
--> 662 ds = _dataset_concat(
663 datasets,
664 dim,
665 data_vars,
666 coords,
667 compat,
668 positions,
669 fill_value=fill_value,
670 join=join,
671 combine_attrs=combine_attrs,
672 )
674 merged_attrs = merge_attrs([da.attrs for da in arrays], combine_attrs)
676 result = arrays[0]._from_temp_dataset(ds, name)
File /env/lib/python3.8/site-packages/xarray/core/concat.py:574, in _dataset_concat(datasets, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs)
569 if len(indexes) < len(datasets):
570 raise ValueError(
571 f"{name!r} must have either an index or no index in all datasets, "
572 f"found {len(indexes)}/{len(datasets)} datasets with an index."
573 )
--> 574 combined_idx = indexes[0].concat(indexes, dim, positions)
575 if name in datasets[0]._indexes:
576 idx_vars = datasets[0].xindexes.get_all_coords(name)
File /env/lib/python3.8/site-packages/xarray/core/indexes.py:387, in PandasIndex.concat(cls, indexes, dim, positions)
385 coord_dtype = None
386 else:
--> 387 coord_dtype = np.result_type(*[idx.coord_dtype for idx in indexes])
389 return cls(new_pd_index, dim=dim, coord_dtype=coord_dtype)
File <__array_function__ internals>:180, in result_type(*args, **kwargs)
TypeError: The DType <class 'numpy.dtype[datetime64]'> could not be promoted by <class 'numpy.dtype[float64]'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is `object`. The full list of DTypes is: (<class 'numpy.dtype[float64]'>, <class 'numpy.dtype[float64]'>, <class 'numpy.dtype[datetime64]'>)
Note in case someone other than me ends up looking at this. There are several scary-looking deprecation warnings coming out of the virtual product tests - please try and fix these at the same time.
Expected behaviour
If you load a virtual product where one of the sensors has no data for the query, it should return the data from the other sensors.
Actual behaviour
An error occurs when you try to load a virtual product which includes one or more sensors that have no scenes for the date range you are using. For example: If you load a virtual product with a query time range of 2018 -2020 and you have Landsat 5 included in your virtual product, it will throw an error.
Steps to reproduce the behaviour
I am using datacube version 1.8.11 on the new Sandbox version 2.0.0 This Virtual products catalog: https://github.com/GeoscienceAustralia/dea-intertidal/blob/main/configs/dea_virtual_product_landsat_s2.yaml code to get the error:
Error message:
Environment information
I am using datacube version 1.8.11 on the new Sandbox version 2.0.0
Notes
This problem does not seem to impact dc.load or the load_ard function from