manodeep / Corrfunc

⚡️⚡️⚡️Blazing fast correlation functions on the CPU.
https://corrfunc.readthedocs.io
MIT License
163 stars 50 forks source link

Correctly convert counts to cf when doing an autocorr #201

Open Christopher-Bradshaw opened 4 years ago

Christopher-Bradshaw commented 4 years ago

I think that the current way that Corrfunc converts counts to a correlation function isn't quite right for autocorrelations. I think the normalization is slightly off (see equation 3 in the LS paper).

The effect of this is totally negligible when the number of pairs is high, and very small even when it isn't so this isn't a big deal... But I was confused why things weren't quite as I expected and it seems worth a fix.

Please let me know I am right about this - there's always a good change I've misunderstood something! This change is also pretty rough at the moment - I just wanted to have something to show. I'm happy to clean it up, write tests, etc later but didn't want to do that before checking this made sense!

Thanks!

pep8speaks commented 4 years ago

Hello @Christopher-Bradshaw! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2019-10-18 20:38:01 UTC
lgarrison commented 4 years ago

Thanks! I think this is consistent with what we're doing in Corrfunc.theory.xi(), so I agree with this change. Would you mind checking that Corrfunc.theory.xi() and convert_3d_counts_to_cf() now agree?

Christopher-Bradshaw commented 4 years ago

Hey @lgarrison I might be missing something (or just looking in entirely the wrong place) but it looks like Corrfunc.theory.xi() doesn't use the LS estimator, just the basic DD/RR.

https://github.com/manodeep/Corrfunc/blob/d8a795859dc9a66b112c539f82da650fa0b8e586/theory/xi/countpairs_xi_impl.c.src#L619

lgarrison commented 4 years ago

No, you're absolutely right! I was mis-remembering that we also supported the natural estimator in convert_3d_counts_to_cf(). Is there another consistency check we could do? Such as, supply analytic DR and RR to the LS estimator, and check that it asymptotes to the natural estimator (i.e. the result of Corrfunc.theory.xi()) for a large number of randoms? I'm not entirely sure that makes sense... but maybe something along those lines? Basically, I want to make sure that we got the -1 correction right in all of our estimators. One can use a small number of data points (e.g. 10) when making this comparison to exaggerate the importance of the ND1 - 1 correction.

manodeep commented 4 years ago

@Christopher-Bradshaw Thanks!

@lgarrison Since the pair-counts are designed to be the same for an auto-corr of D1 and a cross-corr of D1 with D1, shouldn't the conversion from pair-counts to the cf be the same?

manodeep commented 4 years ago

@lgarrison Would your gist to demonstrate how to correctly account for pair-counts in a Rmin=0.0 bin work?

lgarrison commented 4 years ago

It's definitely relevant, as it was designed to show that (ND-1)/V is the right density to use for the randoms in the natural estimator: https://gist.github.com/lgarrison/1efabe4430429996733a9d29397423d2

That's why I was thinking that some kind of consistency check or limit should be possible to ensure that LS and the natural estimator are behaving the same with respect to the ND-1 factor.

Christopher-Bradshaw commented 4 years ago

That gist is actually really useful for what I am doing - I was using N instead of N - 1 in some places too, so thanks!

The test I had locally that was failing basically just constructed data and randoms with the same clustering and then asserted that the auto-correlation was 0. With this change that is now true. We could maybe just do the same thing for all the places where counts are converted to a cf?

Also, just checking that by natural estimator you mean DD/RR?

manodeep commented 4 years ago

Also, just checking that by natural estimator you mean DD/RR?

@Christopher-Bradshaw Yup!

manodeep commented 2 years ago

@Christopher-Bradshaw @lgarrison Does anyone recollect what needs to happen with this PR?