usnistgov / PyHyperScattering

Tools for hyperspectral x-ray and neutron scattering data loading, reduction, slicing, and visualization.
Other
7 stars 9 forks source link

Reindexing causes loads to fail on xarray > 2022.6 #43

Closed pbeaucage closed 1 year ago

pbeaucage commented 2 years ago

A number of unit tests began failing in ~June.

The failure appears to occur in an innocuous assign_coords call inside FileLoader.

I've traced this back to the rewrite of the xarray indexing system introduced in xarray 2022.6.0.

This appears to have been documented as xarray #6881 and fixed in xarray #6889, however, in the meantime, I'm knocking xarray 2022.6.0 out of our dependencies. That should resolve this issue.

I still have no idea why the failure is Py3.8 specific though. That's a major head scratcher.

pbeaucage commented 2 years ago

This issue is still there in xarray 2022.9... evidently earlier dx was wrong. Needs attention.

pbeaucage commented 1 year ago

This was a deep problem.

Inside the generic File loader machinery (FileLoader.py), I was assigning a multi-index during a xr.concat operation like xr.concat(data_rows,dim=system) where system is a Pandas.multiindex. Beginning in xarray 2022.6, that no longer works with Pandas.multiindexes, just Pandas.indexes, or rather it casts the index you so create into a Pandas.index which contains the Pandas.multiindex and causes inconsistent indexes for all subsequent indexing operations (say, an innocent assign_coords call to number the pixels).

Fixed by concatting into an unindexed 'system' dimension, then assigning the index into that dimension.

pbeaucage commented 1 year ago

Verified this works on xarray 2022.3, 2022.6, and 2022.12. Removing the version pin.

pbeaucage commented 1 year ago

Closing - complete. xarray knows about this upstream. pydata/xarray#7148