igmhub / picca

set of tools for continuum fitting, correlation function calculation, cosmological fits...
GNU General Public License v3.0
31 stars 22 forks source link

xcf covariance matrix of transmission field is not positive definite #431

Closed londumas closed 5 years ago

londumas commented 6 years ago

Running do_xcf.py on transmission files: d_i(lambda) = T_i(lambda)/<T(lambda)>-1 for enough forests: 500,000, I get a covariance matrix that is Warning: Matrix is not positive definite. This is not true for do_cf.py where with already 100,000 forests I get a covariance matrix that is positive definite. This issue does not appear when the mocks are fully expanded by quickquasar.

The following plot compares the covariance matrix from transmission (Lya) and from the full expanded mocks (Lya+Continuum) for the diagonal and the off-diagonal terms, for the cross and the auto-correlation. These plots show precisely an issue for the cross-correlation, and not for the auto-correlation:

Conclusion:

xcf_covariance_matrix_covar

xcf_covariance_matrix_corr

cf_covariance_matrix_covar

cf_covariance_matrix_corr

londumas commented 6 years ago

If I leave the projection in the computation of the correlation function, I get a positive definite matrix:

xcf_covariance_matrix_corr_projected

xcf_covariance_matrix_covar_projected

londumas commented 6 years ago

Sadly, the reason is linked to something I was expecting: the mean of delta in each sub-sampling is not random. To investigate this I ran the regular xcf on the transmission files, without projection for the catalog of quasar and the random catalog.

The reason might be that: xi = \<delta eta> - \<delta>\<eta> Basically in do_xcf.py, we compute only xi = \<delta eta>, the random acts as computing \<delta>\<eta>. What do you think @ngbusca?

correlation_rand_data

correlation_xcf_xcfrand_xcfminus

correlationmatrix_xcf_trans

correlationmatrix_xcfminus_trans

ngbusca commented 6 years ago

@londumas what happens if you do the projection on the transmission field? Is it still not positive definite?

londumas commented 6 years ago

@ngbusca, With the project on the transmission filed, I get posititive definite matrices, with the correlation matrix we are used to, see comment https://github.com/igmhub/picca/issues/431#issuecomment-390805569

ngbusca commented 6 years ago

I see, thanks!

londumas commented 6 years ago

Here is the same plot as the first one but for xcf calculated on projected transmission, the correlation between random and data has totally disappeared and the values of \<delta> are way smaller.

correlation_rand_data_projected

andreufont commented 6 years ago

This thread is really interesting. Thanks Helion!

We should think more about different estimators, and test them in mocks. Having picca is great, but it does make it harder to do code comparisons and test new estimators... In the past we had some discussion on whether we should use quasar randoms in the cross-correlation or not, and whether that made estimators more optimal.

I still think that what we do now, i.e., stack as a function of quasar position is a fine thing to do, but it is possible that it makes the covariance matrix more difficult to handle...

By the way, what happened to the Wick covariances? I understand that you are currently computing the covarinces from sub-sampling, right? Are we still able to compute covariances based on 4-point functions? It would be interseting to see whether those are also pathological.

ngbusca commented 6 years ago

@andreufont the Wick covariance code is there but we haven't used it in a while. I plan to take a look at that again for our final round of eBOSS papers.

andreufont commented 6 years ago

Sounds great. Thanks!

andreufont commented 5 years ago

@londumas - The issues that @jfarr03 and I are having is that we have non-positive definite matrices when trying to use picca to measure the correlation function of the Gaussian field, the log-normal density field, the optical depth or the redshift-space optical depth.

It would be really interesting to run these, since it would allow us to do non-trivial tests. For instance:

londumas commented 5 years ago

@andreufont and @jfarr03, do you have that only for the cross or also for the auto? On the raw transmission files I don't get it for the auto, only the cross has issues.

londumas commented 5 years ago

The following message is an update of the previously posted ones, using the latest simulations. The plots show the following:

correlation_matrix_d_shuffle

correlated_delta

new_covariance_matrix

Sending the following:

picca_xcf.py --drq /project/projectdirs/desi/mocks/lya_forest//london//v6.0/v6.0.0/eboss-0.0/zcat_desi_drq.fits --in-dir /project/projectdirs/desi/mocks/lya_forest/picca/london//v6.0/v6.0.0/eboss-raw/deltas/ --z-evol-obj 1.44 --out /project/projectdirs/desi/mocks/lya_forest/picca/london//v6.0/v6.0.0/eboss-raw/tests_no_projection/xcf_z_0_10.fits --nproc 32 --fid-Om 0.3147 --no-project --nspec 100000

picca_export.py --data /project/projectdirs/desi/mocks/lya_forest/picca/london//v6.0/v6.0.0/eboss-raw/tests_no_projection/xcf_z_0_10.fits --out /project/projectdirs/desi/mocks/lya_forest/picca/london//v6.0/v6.0.0/eboss-raw/tests_no_projection/xcf_z_0_10-exp.fits

picca_xcf.py --drq /project/projectdirs/desi/mocks/lya_forest//london//v6.0/v6.0.0/eboss-0.0/zcat_desi_drq.fits --in-dir /project/projectdirs/desi/mocks/lya_forest/picca/london//v6.0/v6.0.0/eboss-raw/deltas/ --z-evol-obj 1.44 --out /project/projectdirs/desi/mocks/lya_forest/picca/london//v6.0/v6.0.0/eboss-raw/tests_no_projection/xcf_z_0_10_shuffle_42.fits --nproc 32 --fid-Om 0.3147 --no-project --nspec 100000 --shuffle-distrib-obj-seed 42

picca_export.py --data /project/projectdirs/desi/mocks/lya_forest/picca/london//v6.0/v6.0.0/eboss-raw/tests_no_projection/xcf_z_0_10_shuffle_42.fits --out /project/projectdirs/desi/mocks/lya_forest/picca/london//v6.0/v6.0.0/eboss-raw/tests_no_projection/xcf_z_0_10_shuffle_42-exp.fits
andreufont commented 5 years ago

Hi @londumas - I remember dicussing about randoms in cross-correlation a while ago (actually, this was first discussed by Ross O'Connell back in 2012), it might be time to properly look at this with mocks and see what is the statistical gain / loss when you take into account the randoms or not. It looks like it helps with the covariance matrix (what is great!) but I'm curious to see the end effect on BAO uncertainties. It might be worth a paper, don't you think? I could ask a student to work on this.

londumas commented 5 years ago

@andreufont, yes a study with a student would be good. Maybe we should change our estimator.

statistical gain / loss when you take into account the randoms or not.

However there can't be any loss of statistical power because you can produce many realization of the random shuffles and get a near noiseless stack. I bet that 2 or 4 already are embedded into the real catalog errors.

londumas commented 5 years ago

This is fixed by using the random shuffle, see the following plot with 10 London mock realizations: raw and raw+continuum. Fixed via PR https://github.com/igmhub/picca/pull/521

image