Fast computation of metal dmat using stack of delta [bump minor]

julienguy commented 1 year ago

Added script to compute the metal matrices using only the delta stack table that one can find in the delta_attributes file.

We assume in this code that the weights of pairs with the same longitudinal separation do not depend (on average) on the transverse separation.

With this hypothesis, the distortion is the same for all values of rtrans, and one can use the average weight as a function of wavelength to compute the response.

Example:

time picca_fast_metal_dmat.py -i /global/cfs/cdirs/desi/science/lya/y1-kp6/iron-tests/deltas/delta-lya-3-0/Log/delta_attributes.fits.gz --out fast_dmat_lya_x_lya.fits --rt-max 200 --np 75 --rp-max 300 --fid-Om 0.315 --fid-Or 7.963e-5 --abs-igm 'SiIII(1207)' 'SiII(1190)' 'SiII(1193)' 'SiII(1260)' --coef-binning-model 2 --rebin-factor 3

takes 30s (most of the time is spent to write the huge matrices)

Comparison of response to a delta_k correlation. blue: fast computation orange: result of the std picca_metal_dmat.py ran with option --coef-binning-model 2 which takes forever. The orange has some statistical fluctuation because of the large rejection factor that has to be used.

SiIII(1207) Screenshot from 2023-09-01 16-37-53

SiII(1260) Screenshot from 2023-09-01 16-58-04

Maybe we can test this on mocks?

julienguy commented 1 year ago

Thanks @iprafols for the suggestions. I am going to do that and work on implementing the same approach for QSO Lya cross-correlation. I checked on the Y1 analysis that I was getting almost the same chi2 with this fast method with the binning of 2Mpc/h for the metal matrices. I still want to run more checks though. (So this PR is still work in progress)

Compare:

Result with std (and very very long) metal computation:


/global/cfs/cdirs/desicollab/users/jguy/lya/iron-tests-jg/vegafits/vega-3-0-0-16/lyaxlya_lyaxlyb_lyaxqso_lybxqso/logs/baseline_vegafit_3-0-0-16.out

lyaxlya chi^2/(ndata-nparam): 1605.3/(1590-16) = 1.020, PTE=0.29

lyaxlyb chi^2/(ndata-nparam): 1644.2/(1590-16) = 1.045, PTE=0.11

lyaxqso chi^2/(ndata-nparam): 3196.8/(3180-16) = 1.010, PTE=0.34

lybxqso chi^2/(ndata-nparam): 3155.0/(3180-16) = 0.997, PTE=0.54

Total chi^2/(ndata-nparam): 9602.0/(9540-16) = 1.008, PTE=0.28


* Result with new metal computation for lyalya and lyalyb:

/global/cfs/cdirs/desicollab/users/jguy/lya/iron-tests-jg/vegafits/vega-3-0-0-17/lyaxlya_lyaxlyb_lyaxqso_lybxqso/logs/baseline_vegafit_3-0-0-17.out

lyaxlya chi^2/(ndata-nparam): 1610.9/(1590-16) = 1.023, PTE=0.25

lyaxlyb chi^2/(ndata-nparam): 1644.8/(1590-16) = 1.045, PTE=0.10

lyaxqso chi^2/(ndata-nparam): 3199.1/(3180-16) = 1.011, PTE=0.33

lybxqso chi^2/(ndata-nparam): 3154.9/(3180-16) = 0.997, PTE=0.54

Total chi^2/(ndata-nparam): 9610.3/(9540-16) = 1.009, PTE=0.26

iprafols commented 1 year ago

Ok, I see, so for now I'll flag it as a draft PR. Let me know when you want more feedback, but the results are really encouraging!

julienguy commented 1 year ago

For SiII(1090), wavelength in the QSO restframe interval [1190A,1205A] cannot be considered for the computation of the matrix because they would correspond to an absorption at a redshift greater than that of the quasar. This test is in place in the full calculation, but it is ignored in the fast computation. However I verified this has a negligible effect on the metal matrix (see plot below). I made this test by commenting out the test zmet<zqso in the full matrix calculation. The effect has to be smaller for SiII(1093) and inexistent for other transitions > 1205A in the rest-frame, so I think we can accept this approximation (the difference is smaller than the fluctuation from one rt bin to the next when computed with the standard method). with_or_without_zqso_criterion

julienguy commented 1 year ago

I added the code to compute the cross-correlation metal matrix.

Example (takes 15s):

time picca_fast_metal_xdmat.py -i /global/cfs/cdirs/desi/science/lya/y1-kp6/iron-tests/deltas/delta-lya-3-0/Log/delta_attributes.fits.gz --drq /global/cfs/cdirs/desi/science/lya/y1-kp6/iron-tests/catalogs/QSO_cat_iron_main_dark_healpix_v0-altbal_zwarn_cut.fits --mode desi_healpix --out fast_metal_dmat_qso_x_lya_lowres.fits --rt-max 200 --np 150 --rp-min -300 --rp-max 300 --fid-Om 0.315 --fid-Or 7.963e-5 --abs-igm 'SiII(1190)' 'SiII(1193)' 'SiIII(1207)' 'SiII(1260)' --rebin-factor 3

xdmat-SiII1190

julienguy commented 1 year ago

New comparison with a more precise estimation with the standard method using a smaller rt range (<40 Mpc/h) allowing for rej=0.995.

xdmat-SiII1190

The differences are smaller than in the previous post. One can imagine residual small differences coming from specifics of the survey that the fast method cannot correct for: for instance a slightly different redshift distribution of QSOs in some part of the footprint correlated with a higher S/N in the forests.

The agreement on the auto-correlation lya x lya is also good when running the std computation with a smaller rt range. See figures below (already posted on slack). So in my opinion this PR is ready for final review before merging.

Figure showing that the matrix elements are virtually independent of r_trans when computing the matrix with a sufficient number of pairs: metal_dmat_SiII_1190_vs_rt_rej099

Figure comparing the std and fast method for the auto-correlation: metal_dmat_SiII_1190_lowres_fast_slow

julienguy commented 1 year ago

I made several modifications to make 'pylint' happier except for some 'Line too long' complaints. In my opinion truncating those lines with \ makes the code harder to read.

igmhub / picca