Errors in the Voigt profile functions

cramirezpe commented 1 year ago

There are errors in the methods defining Voigt profiles.

I wanted to understand the Vogt profile functions in order to improve their performance (using scipy). But the code in py/picca/dla.py is very difficult to understand.

Thankfully there is a different version of the Voigt profile functions in bin/picca_compute_fvoigt.py (functions voigt and tau). However, they lead to different results. For example, for a random choice of z_dla and N_hi these are the differences (setting T=5e4 for both): Figure 31 (1)

I was able to understand the process followed in picca_compute_fvoigt, and I am fairly sure it is correct. I followed Rutten 2003 at 3.3.3. I also wrote independently a function using Garnett et al 2018 Section 6:

from scipy.special import voigt_profile

def garnett_tau(lamb, z, nhi):
    e = 1.6021e-19 #C
    epsilon0 = 8.8541e-12 #C^2.s^2.kg^-1.m^-3
    f = 0.4164
    mp = 1.6726e-27 #kg
    me = 9.109e-31 #kg
    c = 2.9979e8 #m.s^-1
    k = 1.3806e-23 #m^2.kg.s^-2.K-1
    T = 5*1e4 #K
    gamma = 6.265e8 #s^-1
    lamb_alpha = constants.ABSORBER_IGM["LYA"] #A

    lamb_rf = lamb/(1+z)
    lamb_rest = lamb/(1+z)

    v = c *(lamb_rest/lamb_alpha-1)
    b = np.sqrt(2*k*T/mp)
    small_gamma = gamma*lamb_alpha/(4*np.pi)*1e-10

    nhi_m2 = 10**nhi*1e4

    tau = nhi_m2*np.pi*e**2*f*lamb_alpha*1e-10
    tau /= 4*np.pi*epsilon0*me*c
    tau *= voigt_profile(v, b/np.sqrt(2), small_gamma)

    return tau

which I think is easier to understand. This new function perfectly matches the results in picca_compute_fvoigt.

After confirming this, I was able to find the errors in py/picca/dla.py: https://github.com/igmhub/picca/blob/72f27b7368d5c2d31b19d1bcfcce65bc630fe2bc/py/picca/dla.py#L85 This should be 6.265e8

https://github.com/igmhub/picca/blob/72f27b7368d5c2d31b19d1bcfcce65bc630fe2bc/py/picca/dla.py#L95 This should be lambda_rest_frame / lambda_lya

https://github.com/igmhub/picca/blob/72f27b7368d5c2d31b19d1bcfcce65bc630fe2bc/py/picca/dla.py#L99 The Voigt function defined here gives different values in the very suppressed region. This should have a small impact. I did not understand what the function does anyway. I think we should use external libraries whenever is possible.

https://github.com/igmhub/picca/blob/72f27b7368d5c2d31b19d1bcfcce65bc630fe2bc/py/picca/dla.py#L105 lambda_rest_frame here should be lambda_lya.

I tested the speed of the three functions

Note that the version we are using right now is more than a thousand times slower than any of the other two.

In conclusion, we need to implement one of this two methods as the standard one in picca/dla.py. But deltas and correlations will be affected, so maybe it is better if we could check the impact of this change first.

cramirezpe commented 1 year ago

I created a branch with the changes.

Time spent applying masks passes from 1413 seconds to 603. This reduces the total time by about 25%!

No significant differences in the delta statistics.

More updates soon...

cramirezpe commented 1 year ago

Differences in correlations are also tiny: auto: cross: And the fits are slightly better:

iprafols commented 1 year ago

Looks great! Can you open a PR so that we can merge the updates?

p-slash commented 1 year ago

Great work! Before you finalize this, please also fix the following lines in Ly b: https://github.com/igmhub/picca/blob/c757ca4baa9fefb625ced39bc9fb5d9384cea8a4/py/picca/delta_extraction/masks/dla_mask.py#L119-L120

Sorry, I don't know why I haven't done this myself! edit: The fix is the coefficients are swapped!

Waelthus commented 1 year ago

This looks like a great improvement! Wouldn't have guessed that our DLA model is that off. And the new one will be faster as well :-). Given the differences we see here: Did anyone check the code which adds DLAs in quickquasar mocks? Might this contain similar issues (in which case mock production and analysis would now be out of sync) or was the right profile shape used there already (in which case mock analyses using the pre-fix picca would've been out of sync)? Also: did anyone check the effect on large scale power in addition to the BAO analysis?

p-slash commented 1 year ago

desisim DLA profiles are implemented using from scipy.special import wofz. See this function: https://github.com/desihub/desisim/blob/50998bfa41c5ebfc61930d352a455ea0f913584c/py/desisim/dla.py#L111

It would be good to discuss the choice of coefficients. Currently, desisim uses flya = 0.4164, gamma_lya = 626500000.0. However, if you search NIST database, those numbers don't correspond to the same transition. I have raised this issue a while ago, we should consult an expert on this. nist_lya

Here are the values for Lyb line from NIST: nist_lyb

I am inclined to pick the strongest line from these, i.e. flya=0.41641, Alya=4.6986e8 and flyb=0.079142, Alyb=5.5751e7. But maybe the largest numbers dominate the profile, so our current approach is correct.

Waelthus commented 1 year ago

ok, then at least the same functional form is used as the voigt_profile function is just a wrapper around wofz (maybe we should use the voigt_profile function in desisim as well as it's a C++ based wrapper by scipy ppl instead of our own thing). It would be great to use a unified model for all approaches and I agree that we'd ideally have someone in the decision loop who actually is an expert on those details.

igmhub / picca

Errors in the Voigt profile functions #1019