xarray-contrib / xeofs

Comprehensive EOF analysis in Python with xarray: A versatile, multidimensional, and scalable tool for advanced climate data analysis
https://xeofs.readthedocs.io/
MIT License
105 stars 20 forks source link

MCA incorrect coords alignment in transform method #153

Closed agirnow closed 8 months ago

agirnow commented 8 months ago

Describe the bug In the MCA transform method, the output does not have the consistent coordinates along the fit dimensions. The output is full of nan with incorrect coordinates.

Reproducible Minimal Working Example

import xarray as xr
import pandas as pd
import numpy as np

lat = np.arange(-10, 11, 1) 
lon = np.arange(-20, 21, 1) 
onset1 = pd.date_range("2020-01-01", periods=10, freq="D")
onset2 = pd.date_range("2020-01-20", periods=10, freq="D") 

# Create random data
data_X_train = np.random.rand(len(lat), len(lon), len(onset1))
data_y_train = np.random.rand(len(lat), len(lon), len(onset1))
data_X_test = np.random.rand(len(lat), len(lon), len(onset2))

# Create xarray datasets
X_train = xr.DataArray(data_X_train, coords=[("latitude", lat), ("longitude", lon), ("onset", onset1)])
y_train = xr.DataArray(data_y_train, coords=[("latitude", lat), ("longitude", lon), ("onset", onset1)])
X_test = xr.DataArray(data_X_test, coords=[("latitude", lat), ("longitude", lon), ("onset", onset2)])

# MCA apply
mca = xeofs.models.MCA()
mca.fit(X_train,y_train,dim="onset")
mca.transform(X_test)

Obtained result is:

[<xarray.DataArray (mode: 2, onset: 10)>
 array([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]])
 Coordinates:
   * onset    (onset) datetime64[ns] 2020-01-01 2020-01-02 ... 2020-01-10
   * mode     (mode) int64 1 2]

Expected behavior In this example, I would expect coordinates to go from 2020-01-20 and not from 2020-01-01. It seems that the output has kept the coordinates of the dataset used for fitting, while i would except the coordinates from the dataset to transform. It results in a dataset full of nans.

Desktop (please complete the following information):

Thanks for your help !

nicrie commented 8 months ago

hey @agirnow, thank you for documenting that, it indeed looks like a bug. I'm quite packed this week, so I'll be able to take a closer look at it at the earliest over the weekend.

nicrie commented 8 months ago

version 2.3.1 was just released which should fix this issue. let me know @agirnow if it indeed solved the problem for you.

On a different note: from your example using train and test, I wonder if your goal is to make predictions of one field using the transformed scores of the other? Unfortunately, this is not yet implemented - the transform method will really just transform the provided field data into the scores of the latent space.

agirnow commented 8 months ago

Hey @nicrie Thanks, it solved my problem ! :) Yes, that is exaclty what i wanted to do, I thought that to do so it will be okay to use the scores obtained with the transform method on the first field as an input for the inverse_transform method on the other field, but maybe that is not true.