opendatacube / odc-stats

Statistician is a framework of tools for generating statistical summaries of large collections of EO data managed in an ODC instance.
Apache License 2.0
9 stars 4 forks source link

ValueError: conflicting sizes for dimension 'spec': length 1 on the data but length 2 on coordinate 'time' #137

Open alexgleith opened 4 months ago

alexgleith commented 4 months ago

I've been getting an error, below, and I'm finding it hard to reproduce in other environments.

If I run with group_by = None, I can get stats to finish happily.

But when including group_by solar day, it's failing for some regions.

Has anyone seen a similar error and know how to fix it?

Key software versions include:

[2024-06-19 00:58:43,966] {proc.py:217} INFO - Starting processing of x038/y009/2023--P1Y
  xx = xx.groupby(groupby).map(fuser)
Traceback (most recent call last):
  File "/usr/local/bin/odc-stats", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/odc/stats/_cli_run.py", line 233, in run
    for result in result_stream:
  File "/usr/local/lib/python3.10/dist-packages/odc/stats/proc.py", line 237, in _run
    proc.input_data(
  File "/usr/local/lib/python3.10/dist-packages/odc/stats/plugins/_base.py", line 54, in input_data
    xx = load_with_native_transform(
  File "/usr/local/lib/python3.10/dist-packages/odc/algo/io.py", line 222, in load_with_native_transform
    _load_with_native_transform_1(
  File "/usr/local/lib/python3.10/dist-packages/odc/algo/io.py", line 137, in _load_with_native_transform_1
    xx = xx.groupby(groupby).map(fuser)
  File "/usr/local/lib/python3.10/dist-packages/xarray/core/groupby.py", line 1563, in map
    return self._combine(applied)
  File "/usr/local/lib/python3.10/dist-packages/xarray/core/groupby.py", line 1583, in _combine
    applied_example, applied = peek_at(applied)
  File "/usr/local/lib/python3.10/dist-packages/xarray/core/utils.py", line 193, in peek_at
    peek = next(gen)
  File "/usr/local/lib/python3.10/dist-packages/xarray/core/groupby.py", line 1562, in <genexpr>
    applied = (func(ds, *args, **kwargs) for ds in self._iter_grouped())
  File "/usr/local/lib/python3.10/dist-packages/s1_geomad/plugin.py", line 57, in fuser
    return _xr_fuse(xx, partial(_first_valid_np, nodata=np_nan), "")
  File "/usr/local/lib/python3.10/dist-packages/odc/algo/_masking.py", line 707, in _xr_fuse
    return xx.map(partial(_xr_fuse, op=op, name=name))
  File "/usr/local/lib/python3.10/dist-packages/xarray/core/dataset.py", line 6931, in map
    variables = {
  File "/usr/local/lib/python3.10/dist-packages/xarray/core/dataset.py", line 6932, in <dictcomp>
    k: maybe_wrap_array(v, func(v, *args, **kwargs))
  File "/usr/local/lib/python3.10/dist-packages/odc/algo/_masking.py", line 711, in _xr_fuse
    return _fuse_with_custom_op(xx, op, name=name)
  File "/usr/local/lib/python3.10/dist-packages/odc/algo/_masking.py", line 702, in _fuse_with_custom_op
    return xr.DataArray(data, attrs=x.attrs, dims=x.dims, coords=coords, name=x.name)
  File "/usr/local/lib/python3.10/dist-packages/xarray/core/dataarray.py", line 450, in __init__
    coords, dims = _infer_coords_and_dims(data.shape, coords, dims)
  File "/usr/local/lib/python3.10/dist-packages/xarray/core/dataarray.py", line 197, in _infer_coords_and_dims
    _check_coords_dims(shape, new_coords, dims)
  File "/usr/local/lib/python3.10/dist-packages/xarray/core/dataarray.py", line 135, in _check_coords_dims
    raise ValueError(
ValueError: conflicting sizes for dimension 'spec': length 1 on the data but length 2 on coordinate 'time'
alexgleith commented 4 months ago

Possibly found a fix by pinning an old version of xarray xarray==2023.1.0

Kirill888 commented 4 months ago

I reckon issue is in odc.algo use of MultiIndex for spec dim/coord. I think this was a wrong solution that happened to work for a while, and then xarray changed something.

There is no need for multi-index I don't think, one can represent all of that with a single spec dimension (one entry per dataset) and then separate coords along spec dimension for time, uuid, grid. I think that all stemmed from misunderstanding that dim <-> coord relationship can be any to any and not only 1:1.

https://github.com/opendatacube/odc-algo/blob/f67879b1df951f4e1a3e3d52c13b244d1cb516a7/odc/algo/_grouper.py#L84-L94

    coords = [np.asarray(time, dtype="datetime64[ms]"), idx, uuids, grid]
    names = ["time", "idx", "uuid", "grid"]
    if solar_day is not None:
        coords.append(solar_day)
        names.append("solar_day")

    coord = pd.MultiIndex.from_arrays(coords, names=names)

    return xr.DataArray(
        data=data, coords=dict(spec=coord), attrs={"grid2crs": grid2crs}, dims=("spec",)
    )