monocongo / climate_indices

Climate indices for drought monitoring
https://monocongo.github.io/climate_indices/
Other
353 stars 167 forks source link

"calibration_year_final" not correctly taken into account in pdsi calculation #544

Open jeremietellus opened 8 months ago

jeremietellus commented 8 months ago

Code

import numpy as np
import pandas as pd
import xarray as xr

data_start_year=1990
data_end_year=2010

dates = pd.date_range(start=f'{data_start_year}-01-01', end=f'{data_end_year}-12-31', freq='MS')
lat = np.arange(10, 20, 1)
lon = np.arange(180, 190, 1) 

# Generating random data for precipitation and potential evapotranspiration (PET)
precips_data = np.random.rand(len(dates), len(lat), len(lon)) # Precipitation data
pet_data = np.random.rand(len(dates), len(lat), len(lon)) # Potential evapotranspiration data

# Creating the first dataset with precipitation and PET data
first_dataset = xr.Dataset(
    {
        "precips": (["time", "latitude", "longitude"], precips_data),
        "pet": (["time", "latitude", "longitude"], pet_data)
    },
    coords={
        "time": dates,
        "latitude": lat,
        "longitude": lon},
)

# Selecting the calibration year
calibration_year_final=2005

# Modifying data after the calibration year: values are doubled
chosen_date=pd.Timestamp(f"{calibration_year_final+1}-01-01")
second_dataset = first_dataset.where(first_dataset.time < chosen_date, first_dataset * 2)

def palmer_pdsi_wrapper(precips, pet, awc, data_start_year, calibration_year_initial, calibration_year_final):
    pdsi, phdi, wplm, z, _ = palmer.pdsi(precips, pet, awc, data_start_year, calibration_year_initial, calibration_year_final)
    return pdsi, phdi, wplm, z

import os
os.chdir("/home/jeremiesicard/Documents/main-project")
from thirdpart.submodules.climate_indices.src.climate_indices import palmer

# Calculating the Palmer Drought Severity Index (PDSI) for the first dataset
first_pdsi_data =  xr.apply_ufunc(
                palmer_pdsi_wrapper,
                first_dataset.precips,
                first_dataset.pet,
                50,
                data_start_year,
                data_start_year,
                calibration_year_final,
                input_core_dims=[["time"], ["time"], [], [], [], []],
                output_core_dims=[["time"]] * 4,
                vectorize=True,
                output_dtypes=[first_dataset.precips.dtype] * 4,
            )

first_pdsi_dataset=first_pdsi_data[0].compute().to_dataset(name='pdsi').sortby(['latitude','longitude'])

# Calculating the Palmer Drought Severity Index (PDSI) for the second dataset
second_pdsi_data =  xr.apply_ufunc(
                palmer_pdsi_wrapper,
                second_dataset.precips,
                second_dataset.pet,
                50,
                data_start_year,
                data_start_year,
                calibration_year_final,
                input_core_dims=[["time"], ["time"], [], [], [], []],
                output_core_dims=[["time"]] * 4,
                vectorize=True,
                output_dtypes=[second_dataset.precips.dtype] * 4,
            )

second_pdsi_dataset=second_pdsi_data[0].compute().to_dataset(name='pdsi').sortby(['latitude','longitude'])

chosen_date_2 = pd.Timestamp(f"{calibration_year_final}-12-31")

print(f"Is the first pdsi equal to the second pdsi for the period {data_start_year}-01-01 to {calibration_year_final}-12-31?\n",
       np.array_equal(
                first_pdsi_dataset.pdsi.sel(time=first_pdsi_dataset.time <= chosen_date_2).values,
                second_pdsi_dataset.pdsi.sel(time=second_pdsi_dataset.time <= chosen_date_2).values
                ))

print(f"The values that differ over the period {data_start_year}-01-01 to {calibration_year_final}-12-31 are as follows:\n",
      np.unique(first_pdsi_dataset.pdsi.sel(time=first_pdsi_dataset.time <= chosen_date_2)-second_pdsi_dataset.pdsi.sel(time=second_pdsi_dataset.time <= chosen_date_2), return_counts=True))

Describe the bug

The precipitation and PET values that follow the "calibration year final" have an influence on the PDSI values before the "calibration year final" in the calculation of the PDSI.

Expected behavior

In this example, considering:

I expected to have identical PDSI values between the two xarray.Datasets from 1990 to 2005 (inclusive), then different from 2006 onwards. However, as you can see from the printouts, the values are also different for the period from 1990 to 2005. Is the "calibration year final" correctly accounted for?

Screenshots

Is the first pdsi equal to the second pdsi for the period 1990-01-01 to 2005-12-31? False The values that differ over the period 1990-01-01 to 2005-12-31 are as follows: (array([-5.46034514, -2.34233531, -2.23779737, -1.39403927, -1.07894443, -0.96372188, -0.75961582, -0.68137539, 0. , 2.02436987, 2.2568226 , 2.51596723, 2.8048687 , 3.12694393, 3.48600215]), array([ 1, 1, 1, 1, 1, 1, 1, 1, 19186, 1, 1, 1, 1, 1, 1]))

Desktop

monocongo commented 8 months ago

Thanks for this detailed bug report, @jeremietellus