Measuring the correlation function xi by patches

DavidNavarroG commented 2 years ago

Hello Jarvis,

I am trying to compute the NGcorrelation function xi on a sample with some patches defined. I do the correlation for every patch with the others (xi(patch1-patch1), xi(patch1-patch2),...; xi(patch2-patch1), xi(patch2-patch2),...;xi(patch3-patch1),...;...).

I would expect that the sum of the correlation function xi over all the patches would match the correlation function of the whole sample, without dividing by patches. However, even though I get similar results, I get higher values when I compute the correlation function without patches. Is my expectation to get the same values from both ways correct? Do you think some pairs could be missed when computing by patches?

Thank you very much,

David

rmjarvis commented 2 years ago

So, I don't think you probably said quite what you meant. If each pair of patches has enough overlap to measure something real (i.e. there are significant pairs of points to get a real measurement), then each of your correlation functions should have an expectation value equal to the cosmological value, just (much) noisier than the full correlation function. So summing them up should be much larger than what you get when doing the full sample at once.

So maybe you meant to write "average" rather than "sum". That would be closer. But then the problem is that some of the patch combinations probably have zeros in there, so they would pull the average down.

What you would actually expect to match up reasonably closely is the Npairs-weighted average for each bin. So for bin i, you could calculate Sum_k xi_i_k N_i_k / Sum_k N_i_k, where k is looping over your patch combinations. That should work out fairly closely. And I think if you run with bin_slop=0, probably should come out almost identical. (There is still one detail related to shear projections that would make them not be precisely identical -- cf. discussion here.)

Or, maybe you just meant to talk about Npairs itself, rather than xi. For that I think you should expect the sum over all your patch combinations to be pretty close to what you get for the full sample. And if bin_slop=0, it should be exact.

Not sure if anything here clarifies your concern, but if you think there is still some problem, maybe try posting a SSCCE showing what you think isn't working as expected.

DavidNavarroG commented 2 years ago

Thank you for your quick answer.

Here I post a SSCCE with my concern.

Dividing by patches for i in range(NPatches): positions_1 = np.delete(positions, np.where(positions['ID']!=i), axis = 0) for j in range(NPatches): shapes_1 = np.delete(shapes, np.where(shapes['ID']!=j), axis = 0) cat1 = treecorr.Catalog(...) cat2 = treecorr.Catalog(...) dd = treecorr.NGCorrelation(config) dd.process_cross(cat1, cat2) xi_1[i, j,:] = dd.xi dd.clear()
Not dividing by patches cat1 = treecorr.Catalog(...) cat2 = treecorr.Catalog(...) dd = treecorr.NGCorrelation(config) dd.process_cross(cat1, cat2) xi[:] = dd.xi

I was expecting that xi = np.sum(xi_1, axis = (0,1)). I get very similar results, but not the same ones.

Thank you,

David

rmjarvis commented 2 years ago

For future reference, that's not an SSCCE. A self-contained example can't have ... or use a config that you aren't giving as part of the example.

But if the results are "very similar" but not identical, it's almost certainly just the numerical differences you get from the bin slop manifesting differently in the two cases. I point you again to this discussion about shear projections, and also to the more general discussion of bin_slop here.

DavidNavarroG commented 2 years ago

The problem was with the "brute" parameter, I had to set it to true. Now, both approaches to computing xi match perfectly.

Thank you very much.

rmjarvis / TreeCorr

Measuring the correlation function xi by patches #140