rmjarvis / TreeCorr

Code for efficiently computing 2-point and 3-point correlation functions. For documentation, go to
http://rmjarvis.github.io/TreeCorr/
Other
97 stars 37 forks source link

Outlier point in NN Correlation #162

Open vduret opened 9 months ago

vduret commented 9 months ago

Hello,

When I measure the angular two-point correlation function, I sometimes have an outlier point with a much lower amplitude and when it happens, it is systematically in the first angular bin. See the figure : angular_2pcf_sdss

The binning here is linear between 3.5° and 6.5° for the first two redshift bins, 2.5° to 5.5° for the last one, with 20 bins each time. The outlier is observed for all jackknife samples. What could cause this behaviour ?

Thank you,

Vincent

rmjarvis commented 9 months ago

My best guess without more information is masking. If your random catalog doesn't fully capture the masking (either explicit or implicit from selection due to detection issues around foregrounds), then the inferred correlation at small scales will be too small. If extreme enough, it could go negative.

vduret commented 8 months ago

I am working on a mock galaxy catalogue so for the randoms, the mask used is only the angular selection. I have checked that this behaviour can happen using the same data, same randoms and even with the same jackknife patches, only changing the binning of the measurement (min_sep, max_sep and nbins). I am using bin_slop = 1 since I had a negligible effect on the result when using smaller values. I can't share the plot since the data is not public but here is a representation of the comparison of the two successive measurements : 2pcf_outlier_1

I have also another comparison, this time with different randoms but showing a very good agreement on the autocorrelations of 13 redshift bins, except for these outliers : 2pcf_outlier_2

I also had the case where the outliers where only in the 2nd and 3rd redshift bins, still in the first angular bin.

rmjarvis commented 8 months ago

Looking at the derived w(theta) probably isn't the most instructive here. I'd look at the individual DD, DR and RR values to see which one is driving the behavior. If you can generate a small piece of code that reproduces this, not based on proprietary data, I can take a look.