pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.56k stars 1.07k forks source link

⚠️ Nightly upstream-dev CI failed ⚠️ #8844

Closed github-actions[bot] closed 3 months ago

github-actions[bot] commented 6 months ago

Workflow Run URL

Python 3.12 Test Summary ``` xarray/tests/test_duck_array_ops.py::TestOps::test_where_type_promotion: AssertionError: assert dtype('float64') == + where dtype('float64') = array([ 1., nan]).dtype + and = np.float32 xarray/tests/test_duck_array_ops.py::TestDaskOps::test_where_type_promotion: AssertionError: assert dtype('float64') == + where dtype('float64') = array([ 1., nan]).dtype + and = np.float32 xarray/tests/test_rolling.py::TestDataArrayRolling::test_rolling_dask_dtype[float32]: AssertionError: assert dtype('float64') == dtype('float32') + where dtype('float64') = Size: 24B\ndask.array\nCoordinates:\n * x (x) int64 24B 1 2 3.dtype + and dtype('float32') = Size: 12B\narray([1. , 1.5, 2. ], dtype=float32)\nCoordinates:\n * x (x) int64 24B 1 2 3.dtype ```
dcherian commented 6 months ago

To avoid the great scientific python apocalypse of April 2024 (numpy 2 & pandas 3!), we'll need to fix the following. I'm cc'ing people hoping that many of these are easy and obvious fixes :)

  1. a number of array API failures: (cc @keewis, @TomNicholas)
    xarray/tests/test_array_api.py::test_arithmetic: AttributeError: '_DType' object has no attribute 'type'
    xarray/tests/test_namedarray.py::TestNamedArray::test_permute_dims[dims0-expected_sizes0]: ModuleNotFoundError: No module named 'numpy.array_api'
    x
  2. ~Some casting errors in the coding pipeline~: (cc @kmuehlbauer )
    xarray/tests/test_backends.py::TestScipyFileObject::test_roundtrip_mask_and_scale[dtype1-create_masked_and_scaled_data-create_encoded_masked_and_scaled_data]: ValueError: Unable to avoid copy while creating an array from given array.
  3. ~Some copying errors in the coding pipeline~ (cc @kmuehlbauer)
    xarray/tests/test_backends.py::TestScipyFileObject::test_roundtrip_test_data: ValueError: Failed to decode variable 'time': Unable to avoid copy while creating an array from given array.
  4. ~I bet a return value from pandas has changed to scalar leading to a lot of interpolation failures~ (#8861)
    xarray/tests/test_interp.py::test_interpolate_chunk_1d[1-1-0-True-linear]: ValueError: dimensions () must have the same length as the number of data dimensions, ndim=1
  5. Some datetime / timedelta casting errors: (cc @spencerkclark )
    xarray/tests/test_backends.py::test_use_cftime_false_standard_calendar_in_range[gregorian]: pandas._libs.tslibs.np_datetime.OutOfBoundsTimedelta: Cannot cast 0 from D to 'ns' without overflow.
    xarray/tests/test_backends.py::test_use_cftime_false_standard_calendar_in_range[proleptic_gregorian]: pandas._libs.tslibs.np_datetime.OutOfBoundsTimedelta: Cannot cast 0 from D to 'ns' without overflow.
    xarray/tests/test_backends.py::test_use_cftime_false_standard_calendar_in_range[standard]: pandas._libs.tslibs.np_datetime.OutOfBoundsTimedelta: Cannot cast 0 from D to 'ns' without overflow.
  6. ~Some errors from pandas.MultiIndex.names now returning a tuple and not a list (#8847)~
kmuehlbauer commented 6 months ago

3. Some copying errors in the coding pipeline (cc @kmuehlbauer)

8851

kmuehlbauer commented 6 months ago

2. Some casting errors in the coding pipeline: (cc @kmuehlbauer )

8852

spencerkclark commented 6 months ago

5. Some datetime / timedelta casting errors: (cc @spencerkclark )

This is still https://github.com/pydata/xarray/issues/8623#issuecomment-1902757696 — I'll try and look into https://github.com/pandas-dev/pandas/issues/56996 some more this weekend (and at the very least will ping it again).

keewis commented 5 months ago

~In addition to the string dtype failures (the old ones, U and S)~: numpy/numpy#26270

xarray/tests/test_accessor_str.py::test_case_str: AssertionError: assert dtype('<U26') == dtype('<U30')

~we've also got a couple of failures related to TimedeltaIndex~ (#8938)

xarray/tests/test_missing.py::test_scipy_methods_function[barycentric]: TypeError: TimedeltaIndex.__new__() got an unexpected keyword argument 'unit'

As far as I can tell, that parameter has been renamed to freq?

keewis commented 5 months ago

we also have a failing strategy test (hidden behind the numpy.array_api change):

FAILED xarray/tests/test_strategies.py::TestVariablesStrategy::test_make_strategies_namespace - AssertionError: np.float32(-1.1754944e-38) of type <class 'numpy.float32'>

not sure if that's us or upstream in hypothesis (cc @Zac-HD). For context, this is using numpy>=2.0 from the scientific-python nightly wheels repository (see SPEC4 for more info on that). With that version of numpy, scalar objects appear to not be considered float values anymore: isinstance(np.float32(-1.1754944e-38), float) == False Edit: or at least, all but float64 on my system... I assume that depends on the OS?

Zac-HD commented 5 months ago

Yeah, that looks like Hypothesis needs some updates for compatibility - issue opened, we'll get to it... sometime, because volunteers 😅. FWIW I don't think it'll be OS-dependent, CPython float is 64-bit on all platforms.

seberg commented 5 months ago

Just randomly coming here. The way scalars are considered a float/not a float should not have changed. However, promotion would have changed, so previoiusly:

float32(3) + 0.0

for example would have returned a float64 (which is a float subclass).

If that doesn't make it easy to find, and you can narrow down a bit where it happens, you could try wrapping/setting np._set_promotion_state('weak_and_warn') and then np._set_promotion_state('weak') again to undo. That will hopefully give you a warning form the place where the promotion changed, unfortunately, there will likely be a lot of unhelpful warnings/noise (so it would be good to put it very targeted I think).


Please ping me if you get stuck tracking things down, I hope that comment may be helpful, but can try to spend some time looking at it.

kmuehlbauer commented 5 months ago

In addition to the string dtype failures (the old ones, U and S):

xarray/tests/test_accessor_str.py::test_case_str: AssertionError: assert dtype('<U26') == dtype('<U30')

@keewis If you find the time, please have a look into #8932. I think I've identified the problem, but have no idea why this happens only for numpy2 (did not had a thorough look there).

keewis commented 5 months ago

@dcherian, did we decide what to do with the dtype casting / promotion issues?

dcherian commented 5 months ago

I haven't looked at them yet and probably won't have time for a day at least

spencerkclark commented 4 months ago
  1. Some datetime / timedelta casting errors: (cc @spencerkclark)

Things are a bit stuck at the moment on https://github.com/pandas-dev/pandas/issues/56996 / https://github.com/pandas-dev/pandas/pull/57984, so I may just xfail this for the time being (it is an upstream issue anyway).

keewis commented 4 months ago

We have another set of datetime failures. From what I can tell, pandas changed behavior for this:

pd.date_range("2001", "2001", freq="-1ME", inclusive="both")

where on pandas=2.2 this would return DatetimeIndex containing just 2001-01-31, but on pandas=3.0 this will return an empty DatetimeIndex (in general, one entry less than what we're expecting).

As far as I can tell, this is intentional (see https://github.com/pandas-dev/pandas/pull/56832 for the PR that changed it). Should we adapt cftime_range, or is pandas' new behavior too restrictive and we should raise an issue?

spencerkclark commented 4 months ago

Ah thanks for noting that @keewis—yes, I think we should port that bug fix over from pandas. I can try to do that later today along with the xfail.

spencerkclark commented 4 months ago

8996 takes care of the xfail. Porting the pandas negative frequency bug fix to cftime_range will take a little more care in terms of how we handle testing with older pandas versions, so I'll try and take a closer look at it over the weekend.

keewis commented 4 months ago

an update here: the release date of numpy=2.0 has been set to June 16th, which gives us 3-4 weeks to fix the remaining issues. To be safe I'd suggest we try to release a compatible version within the next two weeks (this is mostly aimed at myself, I guess).

jakirkham commented 4 months ago

Thanks Justus! 🙏

What are the remaining issues?

keewis commented 4 months ago

The only remaining issue is #8946. As a summary, we're trying to support the Array API while simultaneously supporting python scalars in where. This is currently not supported by the Array API, but dropping support would be a breaking change for us – see data-apis/array-api#807 and data-apis/array-api#805 for discussion on whether or not the Array API could be changed to help us with that.

Either way we'll need to work around this, and so we need to be able to find a reasonable common dtype for strongly and weakly dtyped data.

In numpy<2.0 (i.e. before NEP50) we would cast scalars to 0D arrays, and since the dtype of those was mostly ignored, it used to have the desired behaviour. I guess this is numpy specific behaviour and would not have worked properly for Array API libraries.

jakirkham commented 4 months ago

^ @seberg would be interested in hearing your take on this question 🙂

keewis commented 3 months ago

we have two new unique issues:

  1. numpy.datetime64 scalars appear to have lost their component attributes (or rather, we're dispatching to numpy.datetime64 instead of cftime):
    xarray/tests/test_coding_times.py::test_infer_datetime_units_with_NaT[dates0-days since 1900-01-01 00:00:00]: AttributeError: 'numpy.datetime64' object has no attribute 'year'
  2. warnings about the conversion to datetime64[ns]:
    xarray/tests/test_variable.py::test_datetime_conversion_warning[[datetime.datetime(2000, 1, 1, 0, 0)]-False]: UserWarning: Converting non-nanosecond precision datetime values to nanosecond precision. This behavior can eventually be relaxed in xarray, as it is an artifact from pandas which is now beginning to support non-nanosecond precision values. This warning is caused by passing non-nanosecond np.datetime64 or np.timedelta64 values to the DataArray or Variable constructor; it can be silenced by converting the values to nanosecond precision ahead of time.

I don't get this with the release candidate, so I assume this is new in one of the upstream-dev versions, probably numpy or cftime (I can't tell for sure, though).

spencerkclark commented 3 months ago

Thanks @keewis—I'll take a look at these over the weekend. I wouldn't be surprised if they were pandas-related.