Closed londumas closed 5 years ago
If I leave the projection in the computation of the correlation function, I get a positive definite matrix:
Sadly, the reason is linked to something I was expecting: the mean of delta in each sub-sampling is not random. To investigate this I ran the regular xcf on the transmission files, without projection for the catalog of quasar and the random catalog.
The reason might be that: xi = \<delta eta> - \<delta>\<eta>
Basically in do_xcf.py
, we compute only xi = \<delta eta>,
the random acts as computing \<delta>\<eta>.
What do you think @ngbusca?
@londumas what happens if you do the projection on the transmission field? Is it still not positive definite?
@ngbusca, With the project on the transmission filed, I get posititive definite matrices, with the correlation matrix we are used to, see comment https://github.com/igmhub/picca/issues/431#issuecomment-390805569
I see, thanks!
Here is the same plot as the first one but for xcf calculated on projected transmission, the correlation between random and data has totally disappeared and the values of \<delta> are way smaller.
This thread is really interesting. Thanks Helion!
We should think more about different estimators, and test them in mocks. Having picca is great, but it does make it harder to do code comparisons and test new estimators... In the past we had some discussion on whether we should use quasar randoms in the cross-correlation or not, and whether that made estimators more optimal.
I still think that what we do now, i.e., stack
By the way, what happened to the Wick covariances? I understand that you are currently computing the covarinces from sub-sampling, right? Are we still able to compute covariances based on 4-point functions? It would be interseting to see whether those are also pathological.
@andreufont the Wick covariance code is there but we haven't used it in a while. I plan to take a look at that again for our final round of eBOSS papers.
Sounds great. Thanks!
@londumas - The issues that @jfarr03 and I are having is that we have non-positive definite matrices when trying to use picca to measure the correlation function of the Gaussian field, the log-normal density field, the optical depth or the redshift-space optical depth.
It would be really interesting to run these, since it would allow us to do non-trivial tests. For instance:
@andreufont and @jfarr03, do you have that only for the cross or also for the auto? On the raw transmission files I don't get it for the auto, only the cross has issues.
The following message is an update of the previously posted ones, using the latest simulations. The plots show the following:
Sending the following:
picca_xcf.py --drq /project/projectdirs/desi/mocks/lya_forest//london//v6.0/v6.0.0/eboss-0.0/zcat_desi_drq.fits --in-dir /project/projectdirs/desi/mocks/lya_forest/picca/london//v6.0/v6.0.0/eboss-raw/deltas/ --z-evol-obj 1.44 --out /project/projectdirs/desi/mocks/lya_forest/picca/london//v6.0/v6.0.0/eboss-raw/tests_no_projection/xcf_z_0_10.fits --nproc 32 --fid-Om 0.3147 --no-project --nspec 100000
picca_export.py --data /project/projectdirs/desi/mocks/lya_forest/picca/london//v6.0/v6.0.0/eboss-raw/tests_no_projection/xcf_z_0_10.fits --out /project/projectdirs/desi/mocks/lya_forest/picca/london//v6.0/v6.0.0/eboss-raw/tests_no_projection/xcf_z_0_10-exp.fits
picca_xcf.py --drq /project/projectdirs/desi/mocks/lya_forest//london//v6.0/v6.0.0/eboss-0.0/zcat_desi_drq.fits --in-dir /project/projectdirs/desi/mocks/lya_forest/picca/london//v6.0/v6.0.0/eboss-raw/deltas/ --z-evol-obj 1.44 --out /project/projectdirs/desi/mocks/lya_forest/picca/london//v6.0/v6.0.0/eboss-raw/tests_no_projection/xcf_z_0_10_shuffle_42.fits --nproc 32 --fid-Om 0.3147 --no-project --nspec 100000 --shuffle-distrib-obj-seed 42
picca_export.py --data /project/projectdirs/desi/mocks/lya_forest/picca/london//v6.0/v6.0.0/eboss-raw/tests_no_projection/xcf_z_0_10_shuffle_42.fits --out /project/projectdirs/desi/mocks/lya_forest/picca/london//v6.0/v6.0.0/eboss-raw/tests_no_projection/xcf_z_0_10_shuffle_42-exp.fits
Hi @londumas - I remember dicussing about randoms in cross-correlation a while ago (actually, this was first discussed by Ross O'Connell back in 2012), it might be time to properly look at this with mocks and see what is the statistical gain / loss when you take into account the randoms or not. It looks like it helps with the covariance matrix (what is great!) but I'm curious to see the end effect on BAO uncertainties. It might be worth a paper, don't you think? I could ask a student to work on this.
@andreufont, yes a study with a student would be good. Maybe we should change our estimator.
statistical gain / loss when you take into account the randoms or not.
However there can't be any loss of statistical power because you can produce many realization of the random shuffles and get a near noiseless stack. I bet that 2 or 4 already are embedded into the real catalog errors.
This is fixed by using the random shuffle, see the following plot with 10 London mock realizations: raw and raw+continuum. Fixed via PR https://github.com/igmhub/picca/pull/521
Running
do_xcf.py
on transmission files:d_i(lambda) = T_i(lambda)/<T(lambda)>-1
for enough forests: 500,000, I get a covariance matrix that isWarning: Matrix is not positive definite
. This is not true fordo_cf.py
where with already 100,000 forests I get a covariance matrix that is positive definite. This issue does not appear when the mocks are fully expanded by quickquasar.The following plot compares the covariance matrix from transmission (Lya) and from the full expanded mocks (Lya+Continuum) for the diagonal and the off-diagonal terms, for the cross and the auto-correlation. These plots show precisely an issue for the cross-correlation, and not for the auto-correlation:
Conclusion: