manodeep / Corrfunc

⚡️⚡️⚡️Blazing fast correlation functions on the CPU.
https://corrfunc.readthedocs.io
MIT License
163 stars 50 forks source link

Usage of Corrfunct in data analysis #253

Closed SergeiDBykov closed 3 years ago

SergeiDBykov commented 3 years ago

Dear devs,

I find your package very handy and useful, thank you for writing it. I have a few questions about the usage of the mock sub-package.

Suppose that My goal is to try estimate spatial CF (xi), projected spatial CF (wp) and angular CF (w). As far as I understand, all of this can be done via Corrfunc. The questions are the following:

1) DDrppi_mocks calculates pair counts in bins specified in the output. For instance, in the output array

…
 16.852277  25.000000  21.180184       15.0     265258   1.000000
…

means that in projected separation between 16.852277 and 25.000000 units and in LOS direction between 14 and 15 units there are 265258 pairs. Is this correct? Because I don’t understand what bins uses convert_3d_counts_to_cf function to create a 3d CF - I would expect that I may construct CF 2d plot (similar to this) by using estimator DD, RR, DR in 2d bins (rp and pi), but this function returns 1d array. This 1d output is obviously xi(r), but what array I should use for r?

In the same way about the projected CF: from the source code I find that this is simply an integral over CF in rp,pi bins along pi direction, but what bins should I use to make a plot wp(rp)vs rp?

2)What is the usage of average radius in a DDrppi_mocks returned arrays (or average theta in angular CF)? I can see that this value increases for a particular rp bin as we move in pi direction. Is it essential later in the code for computing 3d cf?

Thank you for your time and your package!

manodeep commented 3 years ago

Thanks for the kind words and thank you for using Corrfunc. Reports like this help us understand where the documentation may be incomplete/missing.

Dear devs,

I find your package very handy and useful, thank you for writing it. I have a few questions about the usage of the mock sub-package.

Suppose that My goal is to try estimate spatial CF (xi), projected spatial CF (wp) and angular CF (w). As far as I understand, all of this can be done via Corrfunc. The questions are the following:

1. `DDrppi_mocks` calculates pair counts in bins specified in the output. For instance, in the output array
…
 16.852277  25.000000  21.180184       15.0     265258   1.000000
…

means that in projected separation between 16.852277 and 25.000000 units and in LOS direction between 14 and 15 units there are 265258 pairs. Is this correct?

Correct. The first two columns are the min and max of rp bin, then it is the rpavg column, and the fourth is the max of the pi bin, with an assumed pi-binning of 1.0 Mpc/h

Because I don’t understand what bins uses convert_3d_counts_to_cf function to create a 3d CF - I would expect that I may construct CF 2d plot (similar to this) by using estimator DD, RR, DR in 2d bins (rp and pi), but this function returns 1d array. This 1d output is obviously xi(r), but what array I should use for r?

The output is the 1-D array xi(rppi) - you will have to reshape the array to get a 2D xi(rp, pi). These lines might help. Basically, the first dimension is rp, and the second dimension (that changes faster) is pi. Think a simple np.reshape(xirppi, (nrpbins, npibins)) should do it - but you will have to check that the arguments are not swapped ...

In the same way about the projected CF: from the source code I find that this is simply an integral over CF in rp,pi bins along pi direction, but what bins should I use to make a plot wp(rp)vs rp?

I would probably plot the rpavg column of the first rp bin, but you can make other choices (say the max. of the rp bin). As long as you apply the same "rp" point for all the correlation functions, it should be fine (because the correlation function is defined as a histogram and you are representing that with a single point rather than the top bar of the histogram)

2)What is the usage of average radius in a DDrppi_mocks returned arrays (or average theta in angular CF)? I can see that this value increases for a particular rp bin as we move in pi direction. Is it essential later in the code for computing 3d cf?

Helps with plotting and comparing to "data" measurements. Computing rpavg is not essential - that's why it is available as a user-level parameter output_rpavg. (Note: the theory routines run faster when output_rpavg=False, but the speedup is marginal for the mocks sub-packages)

Thank you for your time and your package!

:)

SergeiDBykov commented 3 years ago

@manodeep Dear Manodeep, thanks for the message! It is now very clear.

I made a gist with a function to create simultaneously xi, wp and appropriate rp, pi bins for plotting 3d CF and projected 3d CF. It repeats convert_3d_counts_to_cf and convert_rp_pi_counts_to_wp, but more simply and with appropriate bins returned automatically.