pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.62k stars 1.08k forks source link

duplicate dates when converting from standard to 360 calendar on partial files. #8086

Open aranas opened 1 year ago

aranas commented 1 year ago

What happened?

when converting calendar from a standard to a 360 calendar format on files with partial date spans (eg each file spanning one month), leap years will end up with 5 duplicate dates when later combining those partial files. Specifically, the dates that should be "dropped", end up being reassigned preceding dates which are covered already in a different partial file). This means that each partial/monthly file in a leap year, after conversion will have either 30 or 31 days, and after recombining those the full year will have 365 days.

(Partial files are needed for memory reason)

What did you expect to happen?

Instead we expect the total dates after recombining multiple files to amount to 360 days. We expect the dates that according to the documentation should be dropped for leap years, to be truly dropped, rather than reassigned a preceding date.

Minimal Complete Verifiable Example

import xarray as xr
import numpy as np
import pandas as pd

# Create time coordinates for March and April 1980
time_march = pd.date_range('1980-03-01', '1980-03-31', freq='D')
time_april = pd.date_range('1980-04-01', '1980-04-30', freq='D')

# Create xarray DataArrays with random values for March and April
da_march = xr.DataArray(np.random.rand(len(time_march)), coords=[time_march], dims=['time'])
da_april = xr.DataArray(np.random.rand(len(time_april)), coords=[time_april], dims=['time'])

# Convert to 360-day calendar using 'year' flag
da_march_360 = da_march.convert_calendar(dim='time', calendar='360_day', align_on='year')
da_april_360 = da_april.convert_calendar(dim='time', calendar='360_day', align_on='year')

# Combine the two DataArrays into one
combined_da = xr.concat([da_march_360, da_april_360], dim='time')

# Check for duplicate dates
duplicate_dates = combined_da['time'].to_index().duplicated(keep=False)
if any(duplicate_dates):
    print("Duplicate dates found:")
    print(combined_da['time'][duplicate_dates].values)
else:
    print("No duplicate dates found.")

MVCE confirmation

Relevant log output

Duplicate dates found:
[cftime.Datetime360Day(1980, 3, 30, 0, 0, 0, 0, has_year_zero=True)
 cftime.Datetime360Day(1980, 3, 30, 0, 0, 0, 0, has_year_zero=True)]

Anything else we need to know?

dates seem to be dropped fine when applying convert_calendar from standard to 360 to non-leap years

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.0 | packaged by conda-forge | (main, Oct 25 2022, 06:24:51) [Clang 14.0.4 ] python-bits: 64 OS: Darwin OS-release: 22.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.0 xarray: 2023.2.0 pandas: 1.5.1 numpy: 1.23.4 scipy: 1.10.0 netCDF4: 1.6.1 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.3.3 cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.6.1 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 65.5.0 pip: 22.3.1 conda: None pytest: None mypy: None IPython: None sphinx: None
welcome[bot] commented 1 year ago

Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!