xarray-contrib / xeofs

Comprehensive EOF analysis in Python with xarray: A versatile, multidimensional, and scalable tool for advanced climate data analysis
https://xeofs.readthedocs.io/
MIT License
105 stars 20 forks source link

Data with Nan values #167

Closed LKPatel1 closed 4 months ago

LKPatel1 commented 5 months ago

Working with NaN data My data has too many Nan values. I tried filling the nan values with '0'. But the results do not seem desirable. using other packages (xMCA and pyEOFs) i get more physically meaning patterns.

I want to find a way out for my data using XEOFs as it has too many functionalities. In XMCA there is no cos_lat weighing option.

Can you please suggest either the following: 1) Either cos_lat weighing option for xMCA 2) Or way to use my data (without filling Nan values with 0)

Desktop :

my data image

nicrie commented 5 months ago

NaNs in your data are always a bit tricky. I'm not sure for pyEOFs, but what xMCA does in the case of individual NaNs (e.g. grid points where only a few time steps are missing), is just to remove the entire grid cell before the analysis. This admittedly rude approach works reasonably well when the overall fraction of features with individual NaNs is low. In your case, what's the fraction of grid cells that has individual NaNs?

Alternatively, you have to think about how you can fill these values, either subjectively (e.g. by using some fixed values) or more objectively (e.g. by using Probabilistic PCA). Ultimately, the choice will depend on the data you have. In xeofs we don't automatically treat individual NaNs, because the decision of how to treat NaNs should ultimately be with the analyst.

LKPatel1 commented 5 months ago

Thanks for the response @nicrie xMCA does help. But there is not cos_lat weighing option in xmca right?

nicrie commented 5 months ago

Yes, there is a cosine-latitude weighting option in xMCA. You can find more information here. Note, however, that your solution obtained from xMCA will be exactly the same as using xeofs if you remove all grid cells with individual NaNs. This can be done by calling:

da = da.where(da.notnull().all("time"))

assuming that time is the dimension along which you want to maximize the variance.

LKPatel1 commented 5 months ago

Thanks for the time @nicrie I tried 'apply_coslat()' from the link as suggested.


pca = xMCA(x.z) #x is my dataarray
pca.apply_coslat() 
pca.solve(complexify=False)            # True for complex PCA
svals = pca.singular_values()     # singular vales = eigenvalues for PCA
expvar  = pca.explained_variance()  # explained variance
pcs  = pca.pcs()                 # Principal component scores (PCs)
eofs = pca.eofs()

Yet, the results look like those without 'cos_lat' Is there anything I'm missing?

With xeofs:

\anaconda3\envs\xEOFs\Lib\site-packages\numpy\lib\nanfunctions.py:1879: RuntimeWarning: Degrees of freedom <= 0 for slice.
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,

error

which means, the nan values are not deleted. Here is a screenshot after applying

da = da.where(da.notnull().all("time"))

Result: image

My questions: 1) In xmca: mca.apply_coslat() works for pca too? 2) In xeofs, is there anything I'm missing for not being able to get data without 'NaNs'?

nicrie commented 5 months ago

Why do you think that the warning message implies that NaN values are not deleted? The code

da = da.where(da.notnull().all("time"))

masks out grid cells with individual NaNs. In the preprocessing step of xeofs, these grid cells are removed prior to the SVD (otherwise you would get an error there), and are reinserted afterwards into your results. My guess is that the warning arises due to the computation of standard deviations on only NaN slices. Would you mind sharing a minimal reproducable example?

For your question about xmca: Yes, apply_coslat() works for both PCA and MCA. However, I need to mention that I've stopped maintaining xmca in favor of xeofs for about a year now. This means I can't provide detailed debugging support for xmca anymore. I'd encourage you to focus on xeofs for your current and future analyses, as it's actively maintained and supported.

nicrie commented 4 months ago

Closing as it doesn't seem like a bug in xeofs. Feel free to reopen @LKPatel1