How to perform a North test on the EOF?

SHEN-Cheng commented 4 months ago

Hi,

Thanks for developing this wonderful package, I am using it for the eof analysis on europe.

I want to perform a north test on the EOF modes, and I don't know the difference between the significance test based on bootstrap from this documentation (https://xeofs.readthedocs.io/en/latest/auto_examples/3validation/plot_bootstrap.html), will they have the same effect on the significant test?

Best, Cheng

nicrie commented 4 months ago

Hi @SHEN-Cheng , in general I don't think that North's rule of thumb will give you the same result as bootstrapping. North's rule makes some (strong) assumptions of you underlying data:

Large sample size: Since it is based on the Central Limit Theorem the approximation works well for many samples which are independent and identically distributed. So take care of temporal autocorrelation when estimating the number of samples.
Gaussian distribution: The approximation is derived under the assumption that the data can be approximated by a Gaussian (normal) distribution. This simplifies the mathematics but may not always be strictly true in practical datasets (e.g. precipitation)
Stationarity: The data should be stationary, meaning that its statistical properties do not change over time. So, no trends, no seaonal cycle.
Homoscedastic noise: The rule assumes constant variance in the sampling error.

In contrast, bootstrapping does not require you to assume 2. and 4. since it is a non-parametric approach. In addition, 1. and 3. can be accounted for when using specific resampling techniques, however xeofs only provides a very simple bootstrapping scheme where 1. and 3. are assumed to be true. Does that answer your question?

SHEN-Cheng commented 4 months ago

Hi @SHEN-Cheng , in general I don't think that North's rule of thumb will give you the same result as bootstrapping. North's rule makes some (strong) assumptions of you underlying data:

Large sample size: Since it is based on the Central Limit Theorem the approximation works well for many samples which are independent and identically distributed. So take care of temporal autocorrelation when estimating the number of samples.

Gaussian distribution: The approximation is derived under the assumption that the data can be approximated by a Gaussian (normal) distribution. This simplifies the mathematics but may not always be strictly true in practical datasets (e.g. precipitation)

Stationarity: The data should be stationary, meaning that its statistical properties do not change over time. So, no trends, no seaonal cycle.

Homoscedastic noise: The rule assumes constant variance in the sampling error.

In contrast, bootstrapping does not require you to assume 2. and 4. since it is a non-parametric approach. In addition, 1. and 3. can be accounted for when using specific resampling techniques, however xeofs only provides a very simple bootstrapping scheme where 1. and 3. are assumed to be true. Does that answer your question?

Hi @nicrie, thank you for your detailed explanination, I believe I will use the bootstrapping to do the significant test in my following analysis.

xarray-contrib / xeofs

How to perform a North test on the EOF? #176