pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.51k stars 1.05k forks source link

xr.merge always sorts indexes ascending #2947

Open Huite opened 5 years ago

Huite commented 5 years ago

Code Sample, a copy-pastable example if possible

import xarray as xr
import numpy as np

nrow, ncol = (4, 5)
dx, dy = (1.0, -1.0)
xmins = (0.0, 3.0, 3.0, 0.0)
xmaxs = (5.0, 8.0, 8.0, 5.0)
ymins = (0.0, 2.0, 0.0, 2.0)
ymaxs = (4.0, 6.0, 4.0, 6.0)
data = np.ones((nrow, ncol), dtype=np.float64)

das = []
for xmin, xmax, ymin, ymax in zip(xmins, xmaxs, ymins, ymaxs):
    kwargs = dict(
            name="example",
            dims=("y", "x"),
            coords={"y": np.arange(ymax, ymin, dy),
            "x": np.arange(xmin, xmax, dx)},
    )
    das.append(xr.DataArray(data, **kwargs))

xr.merge(das)

# This won't flip the coordinate:
xr.merge([das[0]))

Problem description

Let's say I have a number of geospatial grids that I'd like to merge (for example, loaded with xr.open_rasterio). To quote https://www.perrygeo.com/python-affine-transforms.html

The typical geospatial coordinate reference system is defined on a cartesian plane with the 0,0 origin in the bottom left and X and Y increasing as you go up and to the right. But raster data, coming from its image processing origins, uses a different referencing system to access pixels. We refer to rows and columns with the 0,0 origin in the upper left and rows increase and you move down while the columns increase as you go right. Still a cartesian plane but not the same one.

xr.merge will alway return the result with ascending coordinates, which creates some issues / confusion later on if you try to write it back to a GDAL format, for example (I've been scratching my head for some time looking at upside-down .tifs).

Expected Output

I think the expected output for these geospatial grids is that; if you provide only DataArrays with positive dx, negative dy; that the merged result comes out with a positive dx and a negative dy as well.

When the DataArrays to merge are mixed in coordinate direction (some with ascending, some with descending coordinate values), defaulting to an ascending sort seems sensible.

A suggestion

I saw that the sort is occurring here, in pandas; and that there's a is_monotonic_decreasing property in pandas.core.indexes.base.Index

I think this could work (it solves my issue at least), in xarray.core.alignment

                index = joiner(matching_indexes)
                if all(
                    (matching_index.is_monotonic_decreasing
                        for matching_index in matching_indexes)
                ):
                    index = index[::-1]
                joined_indexes[dim] = index

But I lack the knowledge to say whether this plays nice in all cases. And does index[::-1] return a view or a copy? (And does it matter?)

For reference this is what it looks like now:

            if (any(not matching_indexes[0].equals(other)
                    for other in matching_indexes[1:]) or
                    dim in unlabeled_dim_sizes):
                if join == 'exact':
                    raise ValueError(
                        'indexes along dimension {!r} are not equal'
                        .format(dim))
                index = joiner(matching_indexes)
                joined_indexes[dim] = index
            else:
                index = matching_indexes[0]

It's also worth highlighting that the else branch causes, arguably, some inconsistency. If the indexes are equal, no reversion occurs.

shoyer commented 5 years ago

I agree that this is a little awkward.

Would it make sense to try to switch to sort=False? That might also help with the inconsistency in the else branch. Though I'm sure the auto-sorting behavior is convenient for some use-cases...

Huite commented 5 years ago

I had the same thought, but I think we do need sorting in general; just descending rather than ascending for this kind of grid. Otherwise you might end up with non-monotonic coordinates, which is rarely desirable.