manodeep / Corrfunc

⚡️⚡️⚡️Blazing fast correlation functions on the CPU.
https://corrfunc.readthedocs.io
MIT License
163 stars 50 forks source link

Incorrect number counts of galaxy pairs on large scale(s>10Mpc) #198

Closed LeiAstro closed 4 years ago

LeiAstro commented 4 years ago

Incorrect number counts of galaxy pairs on large scale(s>10Mpc)

I am working on galaxy clusterings on the scale less than around 60Mpc, I use my own Fortran code to perform the pairs countings before I found Corrfunc. The Corrfunc is super faster than my codes, so I decide to use Corrfunc. To make sure I get the same results for the same sample by using different codes, I made a check. For a galaxy sample and its corresponding random sample, I calculated the projected two-point correlation function using both the Corrfunc and my own codes, and wp(r_p) derived from different codes are slightly different. In order to find which codes give the incorrect wp(r_p), I compared the galaxy-galaxy pair counts derived from the two codes, and I found that the number of pairs from two codes is different in large r bins, as shown below: Corrfunc: import numpy as np from Corrfunc.theory.DD import DD gdata=np.loadtxt("galaxy_xyz.dat") X=gdata[:,0] Y=gdata[:,0] Z=gdata[:,0] nthreads=50 aa=np.linspace(-2.,1.8,20) bins=10.**aa autocorr=1 DDn=DD(autocorr,nthreads,bins,X,Y,Z,output_ravg=True)

The DDn given by Corrfunc is :

0.010000 0.015849 0.012379 1172 0.000000 0.015849 0.025119 0.020728 768 0.000000 0.025119 0.039811 0.032174 774 0.000000 0.039811 0.063096 0.051028 1138 0.000000 0.063096 0.100000 0.082221 1874 0.000000 0.100000 0.158489 0.129879 3406 0.000000 0.158489 0.251189 0.206134 6074 0.000000 0.251189 0.398107 0.327142 11594 0.000000 0.398107 0.630957 0.517950 22354 0.000000 0.630957 1.000000 0.820258 41492 0.000000 1.000000 1.584893 1.297674 74738 0.000000 1.584893 2.511886 2.066595 135258 0.000000 2.511886 3.981072 3.292980 291080 0.000000 3.981072 6.309573 5.253221 749496 0.000000 6.309573 10.000000 8.345029 2165098 0.000000 10.000000 15.848932 13.260500 6766712 0.000000 15.848932 25.118864 21.053047 22450646 0.000000 25.118864 39.810717 33.392555 77411296 0.000000 39.810717 63.095734 52.798855 257952048 0.000000

By using my code, the pair counts of "galaxy_xyz.dat" are:

     1172.0000000000
      768.0000000000
      774.0000000000
     1138.0000000000
     1874.0000000000
     3406.0000000000
     6074.0000000000
    11594.0000000000
    22354.0000000000
    41492.0000000000
    74738.0000000000
   135258.0000000000
   291080.0000000000
   749496.0000000000
  2165098.0000000000
  6766682.0000000000
 22450214.0000000000
 77405384.0000000000
257859458.0000000000

I have also counted the pairs using other methods, say, the astroML BallTree() function and the simple two-loop pair counting, the numbers of pairs given by all these two methods are the same as results of my code. So, it seems Corrfunc.theory.DD() gives the incorrect numbers in bins of r>10Mpc, or maybeI used Corrfunc.theory.DD() in the wrong way. Any suggestions? Thanks!!!

lgarrison commented 4 years ago

Hi, thanks for the report! It looks like you did not specify boxsize as an argument in your call to Corrfunc.theory.DD, so the code will try to infer a periodic domain from the extent of the particle distribution. So if the particles are in a box of 500 Mpc but the particles only span 498 Mpc, then the pair counts will be incorrect on large scales. If you run with verbose=True, you will see the inferred box size. Can you manually specify the boxsize and try again?

More info on boxsize here: https://corrfunc.readthedocs.io/en/master/api/Corrfunc.theory.html#module-Corrfunc.theory.DD

@manodeep I am in favor of removing automatic boxsize detection due to issues like this.

LeiAstro commented 4 years ago

Cool! Thanks for the quick reply! The galaxy sample I used is a mock galaxy sample, so it is not in a box or in a regular volume. When I add the periodic=False parameter, the Corrfunc.theory.DD() gives the correct number counts!!! I really appreciate your help!! I will read every function and parameter in Corrfunc more carefully!

manodeep commented 4 years ago

@LeiAstro Looks like your issue is resolved - is it okay to close this issue?

@lgarrison I am fine with requiring boxsize when periodic==True. Is that what you were thinking?

LeiAstro commented 4 years ago

Sure!! Thanks!!

manodeep commented 4 years ago

@LeiAstro Great - thanks!

LeiAstro commented 4 years ago

Sorry, but I have a quick question on this closed issue. When I use "Corrfunc.theory.DDrppi" to count pairs and then measure wp by "convert_rp_pi_counts_to_wp" function, I still got a slightly different value for each rp bins, and different number of pairs in [pi,rp] bins. My code is: DD_counts = DDrppi(1,nthreads,pimax,bins,X,Y,Z,periodic=False, verbose=True) Actually, I don't know how Corrfunc.theory.DDrppi transfers (X,Y,Z) to rp and pi. I am wondering the definition of the line-of-sight vector in Corrfunc, say, (1) middle-point of two galaxies? (e.g., Fisher, K. B., Davis, M., Strauss, M. A., Yahil, A., & Huchra, J. 1994, MNRAS, 266, 50) (2) end-point line-of-sight? (3) Angular bisector? I think the difference may be caused by the definition of LOS, and I haven't found any info on this in the corrfunc docs. Tons of thanks!!! @manodeep @lgarrison The figures of Beutler et al.(2018) show the definition of LOSs. LOS

manodeep commented 4 years ago

@LeiAstro The convention in Corrfunc is from Fisher et al (1994). You can find more details in the docs here

LeiAstro commented 4 years ago

Many thanks!!!