manodeep / Corrfunc

⚡️⚡️⚡️Blazing fast correlation functions on the CPU.
https://corrfunc.readthedocs.io
MIT License
163 stars 50 forks source link

Error in the first correlation value obtained using Corrfunc #268

Closed ub2206 closed 2 years ago

ub2206 commented 2 years ago

Hello, I am an undergraduate student working on the 2-point Angular Correlation value calculation of GRB data. I computed it using two methods, one using the "astroML" library and other using the "Corrfunc" library. As promised, the Corrfunc library is very fast. What takes 45 minutes using astroML takes less than 5 minutes using Corrfunc. But when I obtain the correlation values, the first value in all cases is seemingly very high compared to all other values. That value is coming out reasonably well using the astroML.

I am attaching the plotted correlation values. The plot containing some grey shades in the background is the plot obtained using astroML and one with completely white is the one obtained using Corrfunc. The grey shades can be ignored as it is not required here. It can be just used for distinguishing between two plots.

C1 C2 E2 E1

If the issue is already resolved, I am sorry for repeating again and request you to kindly guide me through it.

I have used the DDtheta_mocks and the convert_3d_counts_to_cf functions in my code. If required, I'd be happy to share the codes.

lgarrison commented 2 years ago

I think this is probably a case of self-counts being included in bins that have a minimum separation of 0 (which is the expected behavior although a little surprising sometimes). It's there to ensure that auto- and cross-correlations yield the same answer. For your case, I think you probably want to subtract off the self-count in the first bin; that is, subtract N from DDtheta.

ub2206 commented 2 years ago

Thanks a lot for the response and sorry for the late reply from my side. I would just like to confirm just one thing (in case if I am doing anything wrong).

The N is supposed to be subtracted from the npairs of the auto-correlation term only, am I right? The cross-correlation term should be left as it is.

lgarrison commented 2 years ago

That's right, it won't make a difference for cross-correlations. You may also need to subtract it from the RR term if that's computed with pair counts.

ub2206 commented 2 years ago

Thanks a lot for the solutions. The results are much more accurate and explainable now.

manodeep commented 2 years ago

Thanks @lgarrison!

@ub2206 While your issue itself is resolved, since you are effectively computing angular correlation functions over half the sky - it might be worthwhile to check if the runtime improves if you set both link_in_ra and link_in_dec to False.