Closed singledoggy closed 7 months ago
First off, my apologies for the late reply - I had an extremely busy week till now.
So, I cannot claim to be an expert on CCA, I only used it myself from time to time. In xeofs, the CCA's implementation is based on and tested against the PCACCA implementation of the CCA-Zoo package.
All I can say right now is that your conception of explained variance (ratio) should hold for CCA as you know it from PCA. The only thing to keep in mind is, that CCA maximizes correlation between different fields, which does not necessarily imply high amount of explained variance. That being said I also find the amount of explained variance in the given example very low. I'll try to double check the coming week and keep you posted on this @singledoggy
I just pushed a patch that should fix the incorrect computation of explained variance. Updating to the newest version should resolve the problem for you. Please let me know if it worked.
Also, from my limited experience with CCA, I can say that increasing the regularization (either by reducing the variance_fraction
or by increasing the ridge coefficient c
) can help increasing the explained variance.
Thank you for your prompt response. I appreciate the additional information you provided. It is important to note that the use of CCA does not necessarily guarantee a high level of explained variance. In fact, some articles do not even report the explained variance.
The explained variance in the current version is much more reasonable compared to the previous one.
[<xarray.DataArray (mode: 2)>
array([0.21777254, 0.04709501])
Coordinates:
* mode (mode) int64 1 2
Attributes:
long_name: Monthly Means of Sea Surface Temperature
units: degC
var_desc: Sea Surface Temperature
level_desc: Surface
statistic: Mean
dataset: NOAA Extended Reconstructed SST V5
parent_stat: Individual Values
actual_range: [-1.8 42.32636]
valid_range: [-1.8 45. ],
<xarray.DataArray (mode: 2)>
array([0.10384682, 0.09979495])
Coordinates:
* mode (mode) int64 1 2
Attributes:
long_name: Monthly Means of Sea Surface Temperature
units: degC
var_desc: Sea Surface Temperature
level_desc: Surface
statistic: Mean
dataset: NOAA Extended Reconstructed SST V5
parent_stat: Individual Values
actual_range: [-1.8 42.32636]
valid_range: [-1.8 45. ],
<xarray.DataArray (mode: 2)>
array([0.15104585, 0.06363411])
Coordinates:
* mode (mode) int64 1 2
Attributes:
long_name: Monthly Means of Sea Surface Temperature
units: degC
var_desc: Sea Surface Temperature
level_desc: Surface
statistic: Mean
dataset: NOAA Extended Reconstructed SST V5
parent_stat: Individual Values
actual_range: [-1.8 42.32636]
valid_range: [-1.8 45. ]]
Glad it helped - closing this.
Example
I use my own data and get a extremly low explained_variance_ratio, so I use the example data like:
and the
model.explained_variance_ratio()
are so small.Question
If I decrease
init_pca_modes=0.30
, the warning message states that"variance fraction 0.9000 is not reached. Only 0.7529 of variance is explained."
It means the variance in the preprocess step of PCA.Does the
.explained_variance_ratio()
here mean anything like in EOFs? I assum it should explain the variance of indian, pacific, atlantic respectively, but it's not that case.