pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.61k stars 1.08k forks source link

Finding mean/median for each time step (with a time based window - within +/- 1 months) #8832

Closed mmcs-work closed 7 months ago

mmcs-work commented 7 months ago

What is your issue?

Is there a way to efficiently find the median around each time step, considering only dates within +/- 1 month, using xarray without resorting to for loops?

Details: (MCVE) I have a dataset consisting of 5 random time steps from January 1, 2024, to December 31, 2024. Each time step has a random date, and subsequent time steps may have dates that vary significantly. For example, if one time step is on February 15, 2024, the next ones can be on March 20, 2024, March 21, 2024, June 17, 2024, etc.

I would like to calculate the median for each time step while considering only the data within a window of +/- 1 month around each time step's date. The number of dates within this window can vary depending on the specific time step. A naive implementation using for loops is slow.

I would like to achieve the same result as shown below without using for loops, leveraging xarray's capabilities for efficient data manipulation:


import xarray as xr
import pandas as pd
import numpy as np

num_times = 5
start_date = pd.Timestamp('2024-01-01')
end_date = pd.Timestamp('2024-12-31')
times = pd.to_datetime(np.random.choice(pd.date_range(start_date, end_date), num_times, replace=False))
times = np.sort(times)

latitudes = np.linspace(-90, 90, 2)
longitudes = np.linspace(-180, 180, 2)

coords = {'time': times, 'latitude': latitudes, 'longitude': longitudes}
data = np.random.randint(1,10,(5, 2, 2))  # Random data for demonstration

xr_data = xr.DataArray(data, coords=coords, dims=['time', 'latitude', 'longitude'])

def calculate_monthly_median_for_time_index(xr_data, time_index):
    start_date = xr_data.time.data[time_index] - pd.DateOffset(months=1)
    end_date = xr_data.time.data[time_index] + pd.DateOffset(months=1)
    data_within_window = xr_data.sel(time=slice(start_date, end_date))
    return data_within_window.median(dim='time')

monthly_medians = []
for i in range(len(xr_data.time)):
    monthly_median = calculate_monthly_median_for_time_index(xr_data, i)
    monthly_medians.append(monthly_median)

monthly_medians_combined = xr.concat(monthly_medians, dim='time')

print(monthly_medians_combined)

Here is one example run output:

xr_data::
<xarray.DataArray (time: 5, latitude: 2, longitude: 2)>
array([[[2, 2],
        [5, 9]],

       [[3, 7],
        [2, 4]],

       [[6, 5],
        [9, 9]],

       [[3, 3],
        [6, 6]],

       [[5, 5],
        [7, 9]]])
Coordinates:
  * time       (time) datetime64[ns] 2024-01-30 2024-06-20 ... 2024-09-19
  * latitude   (latitude) float64 -90.0 90.0
  * longitude  (longitude) float64 -180.0 180.0

monthly_medians_combined::
<xarray.DataArray (time: 5, latitude: 2, longitude: 2)>
array([[[2. , 2. ],
        [5. , 9. ]],

       [[4.5, 6. ],
        [5.5, 6.5]],

       [[4.5, 6. ],
        [5.5, 6.5]],

       [[3. , 3. ],
        [6. , 6. ]],

       [[5. , 5. ],
        [7. , 9. ]]])
Coordinates:
  * latitude   (latitude) float64 -90.0 90.0
  * longitude  (longitude) float64 -180.0 180.0
Dimensions without coordinates: time

time (coordinates)::
<xarray.DataArray 'time' (time: 5)>
array(['2024-02-01T00:00:00.000000000', '2024-03-09T00:00:00.000000000',
       '2024-08-01T00:00:00.000000000', '2024-10-10T00:00:00.000000000',
       '2024-11-01T00:00:00.000000000'], dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 2024-02-01 2024-03-09 ... 2024-11-01

Note: Rolling was not used since the time steps are not uniform. Resampling does not sample along each point - rather it's operated on the entire series - so that also doesn't help (unless there are some points that I am missing here).

welcome[bot] commented 7 months ago

Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!