pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.57k stars 1.07k forks source link

reading time from a file has changed #9189

Closed jmccreight closed 3 months ago

jmccreight commented 3 months ago

What is your issue?

I'm not sure this is a bug and I dont have an MCVE yet. So I'm just asking if i'm doing something wrong or missing some option I should be using. I looked around and didnt find anything obvious. The change in behavior is concerning though.

The back story is that I have some of my own tests that test some functionality related to Dataset.to_dict(). These tests started failing with the latest release when numpy 2.0 is present but only for python 3.10 (and not for 3.9, those are all python versions I test at the moment).

The tests read a file that I've been using for years for this test. And I get different values of the time coordinate depending on what version of numpy I'm using with xarray, numpy 2 or earlier. Look at the values of the time variable under numpy 2.0 vs the earlier version (which I believe is correct).

First, the environment where I'd pinned numpy<2.0.0.

Python 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:51:49) [Clang 16.0.6 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.25.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import xarray as xr

In [2]: import numpy as np

In [3]: xr.__version__
Out[3]: '2024.6.0'

In [4]: np.__version__
Out[4]: '1.26.4'

In [5]: ds = xr.open_dataarray("../test_data/drb_2yr/prcp.nc")

In [6]: ds.time
Out[6]: 
<xarray.DataArray 'time' (time: 731)> Size: 6kB
array(['1979-01-01T00:00:00.000000000', '1979-01-02T00:00:00.000000000',
       '1979-01-03T00:00:00.000000000', ..., '1980-12-29T00:00:00.000000000',
       '1980-12-30T00:00:00.000000000', '1980-12-31T00:00:00.000000000'],
      dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 6kB 1979-01-01 1979-01-02 ... 1980-12-31
Attributes:
    type:           f4
    long_name:      time
    standard_name:  time

Now change environments to where I'd pinned numpy==2.0.0.

Python 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:51:49) [Clang 16.0.6 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.25.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import xarray as xr

In [2]: import numpy as np

In [3]: xr.__version__
Out[3]: '2024.6.0'

In [4]: np.__version__
Out[4]: '2.0.0'

In [5]: ds = xr.open_dataarray("../test_data/drb_2yr/prcp.nc")

In [6]: ds.time
Out[6]: 
<xarray.DataArray 'time' (time: 731)> Size: 6kB
array(['1979-01-01T00:00:00.000000000', '1979-01-02T00:00:00.003211264',
       '1979-01-03T00:00:00.006422528', ..., '1980-12-29T00:00:03.344433152',
       '1980-12-30T00:00:00.906559488', '1980-12-31T00:00:02.763653120'],
      dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 6kB 1979-01-01 ... 1980-12-31T00:00:02.763...
Attributes:
    type:           f4
    long_name:      time
    standard_name:  time

Further along, these apparently erroneous sub-microsecond datetime64 values fail a conversion/roundtrip via Variable.to_dict(). But maybe that is expected.... the erroneous input is not expected.

TIA!

kmuehlbauer commented 3 months ago

Seems like a dupe of https://github.com/pydata/xarray/issues/9179?

jmccreight commented 3 months ago

Indeed. Sorry I didnt see that. I confirmed main solves my issue. Thanks!