Single mode data reconstruction fails with normalized scores

roxyboy commented 9 months ago

Hi, thank you for developing this awesome package! This is probably a stupid question but I'm having trouble reconstructing the input data from the EOF modes. I would have naively thought that this could be done by (model.components() * model.scores()).sum('modes') but the order of magnitude seems to be off...

nicrie commented 9 months ago

Hey @roxyboy! By convention, xeofs uses PC scores normalized by the L2 norm. However, for reconstruction purposes, you'll need the "raw" (unnormalized) PC scores. You can easily get these by setting model.scores(normalized=False).

If you're looking for a simpler route, consider using the inverse_transform method. This method lets you reconstruct the original field by selecting just a subset of PC modes:

pcs = model.scores()
pcs = pcs.sel(mode=slice(1, 5))
field_reconstructed = model.inverse_transform(pcs)

This method also accepts the normalized scores.

roxyboy commented 9 months ago

Thank so much for your quick reply!

roxyboy commented 9 months ago

Hello, apologies for coming back to this but could you explain why the following gives different results for the first mode EOF reconstruction?

pcsF = model.scores(normalized=False)
reEOF_explicit = pcsF * model.components()
fig, (ax1,ax2) = plt.subplots(figsize=(10,4), nrows=1, ncols=2)
reEOF_explicit.isel(mode=0).real.plot(ax=ax1)
pcs = model.scores()
model.inverse_transform(pcs.isel(mode=0)).real.plot(ax=ax2)

roxyboy commented 9 months ago

Somewhat related to my previous post, could you explain why selecting the mode as opposed to slicing the modes produces different results...?

fig, (ax1,ax2) = plt.subplots(figsize=(10,4), nrows=1, ncols=2)
model.inverse_transform(pcs.isel(mode=0)).real.plot(ax=ax1)
model.inverse_transform(pcs.isel(mode=slice(None,1))).real.plot(ax=ax2)

nicrie commented 9 months ago

Hey, no worries! So regarding your first question I think there's a little mistake in the reconstruction. You actually want to use the dot product between the PC scores and the eigenvectors, but you use element-wise multiplication. Also, you have to take care of scaling your reconstructed data (mean, standard deviation, other weights etc.)

So a minimal example for reconstructing the original data could like the following:

import xarray as xr
from xeofs.models import EOF

X = xr.tutorial.open_dataset("air_temperature")["air"]
Xmean = X.mean(dim="time")   # store the mean for reconstruction

eof = EOF(n_modes=3)
eof.fit(X, dim="time")

comps = eof.components()
scores = eof.scores(normalized=False)

# Reconstruction by xeofs
Xrec1 = eof.inverse_transform(scores, normalized=False)

# Our own reconstruction by hand
Xrec2 = xr.dot(comps, scores, dims="mode")
Xrec2 = Xrec2 + Xmean
Xrec2 = Xrec2.transpose("time", "lat", "lon")   # just ensure the same order of dims

xr.testing.assert_allclose(Xrec1, Xrec2)

Do that with your own data, it should work! =)

As to your second question, I cannot reproduce your result. Would you mind sharing a minimal working example that reproduces your observation? Continuing the example from above, the following

pcs1 = scores.isel(mode=[0])   # <-- use [0] instead of 0 to ensure that the pcs are 2D
pcs2 = scores.isel(mode=slice(None, 1))

xr.testing.assert_allclose(pcs1, pcs2)

Xrec1 = eof.inverse_transform(pcs1, normalized=False)
Xrec2 = eof.inverse_transform(pcs2, normalized=False)

xr.testing.assert_allclose(Xrec1, Xrec2)

works for me! Note, however, that I used mode=[0] to ensure that the PCs are 2D (we keep the mode dimension). Although the numerical reconstruction works also with mode=0, you will notice that Xrec1 has an additional mode dimension which is absent in Xrec2. So in that case xr.testing.assert_allclose will complain about the misaligned dimensions.

roxyboy commented 9 months ago

Thank you again for you quick reply. Below is a minimal example that I've been testing my code out with:

def f1(x,t): 
    return xr.DataArray(1./np.cosh(x[np.newaxis,:]+3)*np.exp(2.3j*t[:,np.newaxis]),
                        coords=[('time',t),('x',x)])

def f2(x,t):
    return xr.DataArray(2./np.cosh(x[np.newaxis,:])*np.tanh(x)*np.exp(2.8j*t[:,np.newaxis]),
                        coords=[('time',t),('x',x)])

x = np.linspace(-5, 5, 128)
t = np.linspace(0, 4*np.pi, 256)

X1 = f1(x, t)
X2 = f2(x, t)
X = X1 + X2

This gives the dataset as plotted below for its real part X.pdf

I then apply the EOFs

from xeofs.models import EOF

model = EOF(n_modes=4)
Xmean = X.mean('time')

model.fit(X, dim=("time"))
expvar = model.explained_variance_ratio()
components = model.components()

pcsF = model.scores(normalized=False)
pcs = model.scores()

# Reconstruction by hand
reEOF_explicit = xr.dot(pcsF, components, dims='mode')
reEOF_explicit += Xmean
# xeofs reconstruction
reEOF = model.inverse_transform(pcs)

# Reconstruction of mode 1
reEOF0 = model.inverse_transform(pcs.isel(mode=slice(None,1)))
reEOF0_ex0 = model.inverse_transform(pcs.isel(mode=0))
reEOF0_ex1 = model.inverse_transform(pcs.isel(mode=[0]))

When plotting the following

fig, (ax1,ax2) = plt.subplots(figsize=(10,4), nrows=1, ncols=2)
(reEOF_explicit + Xmean).real.plot(ax=ax1)
(reEOF).real.plot(ax=ax2)

ax1.set_title("Recon. by hand")
ax2.set_title("xeofs recon.")

they look different where the left panel is essentially missing the contribution from the third EOF mode.

Also, when comparing the mode 1 reconstruction

fig, (ax1,ax2,ax3) = plt.subplots(figsize=(15,4), nrows=1, ncols=3)
(reEOF0).real.plot(ax=ax1)
(reEOF0_ex0).real.plot(ax=ax2)
(reEOF0_ex1).real.plot(ax=ax3)

ax1.set_title("Slicing")
ax2.set_title("Index")
ax3.set_title("List")

I get the following

where the first and third panel agree with each other but the middle panel doesn't...

nicrie commented 9 months ago

Thanks @roxyboy for the example! That makes things a bit clearer :) Specifically, I didn't catch from your first post that you were working with complex input data. In this case, the correct approach to reconstruction involves using the conjugate transpose, so your example becomes:

# Reconstruction by hand
reEOF_explicit = xr.dot(pcsF, components.conj(), dims='mode')
reEOF_explicit += Xmean

This operation is automatically handled by xeofs, addressing the discrepancies you've noticed in the real part.

However, since xeofs was originally not designed to handle complex input data, reconstructed fields are always enforced to be real, i.e. the imaginary parts are always removed. This is why the imaginary parts of the reconstructions still won't match. Luckily, xeofs can already handle complex data internally, so extending support for complex input data seems straightforward. I'll try to work on that the coming days.

Regarding your second point: I found a little bug in the code that occurs when you use normalized PCs together with a single mode (e.g. `.isel(mode=0)') in order to reconstruct the data. Using either unormalized PCs or more than 1 mode, the differences disappear, e.g. below all the results are the same:

reEOF0 = model.inverse_transform(pcsF.isel(mode=slice(None,1)), normalized=False)
reEOF0_ex0 = model.inverse_transform(pcsF.isel(mode=0), normalized=False)
reEOF0_ex1 = model.inverse_transform(pcsF.isel(mode=[0]), normalized=False)

I'll make a patch to fix this - in the meantime just specify mode=[0] and you're on the safe side :)

nicrie commented 9 months ago

@roxyboy with the new release you should be able to reconstruct the real part even when using isel(mode=0). Note though that your example above will only completely work when complex input data is fully supported. Follow #150 for checking progress on that.

roxyboy commented 9 months ago

@nicrie Thank you so much!

xarray-contrib / xeofs

Single mode data reconstruction fails with normalized scores #149