rmjarvis / TreeCorr

Code for efficiently computing 2-point and 3-point correlation functions. For documentation, go to
http://rmjarvis.github.io/TreeCorr/
Other
99 stars 37 forks source link

bin_slop definition and implementation #153

Closed ahallcosmo closed 1 year ago

ahallcosmo commented 1 year ago

I'm a bit unsure about how bin_slop is defined, is it meant to be the total slop or half the slop?

In the documentation it is written "For example, if a bin nominally goes from 10 to 20 arcmin, then with bin_slop = 0.05, TreeCorr will accumulate pairs with separations ranging from 9.5 to 20.5 arcmin into this bin", i.e. a fraction bin_slop of the bin width slops over each side, and the total effective bin width is bin_size + 2*bin_slop*bin_size. But in the function singleBin on line 81 in BinType.h we have

        // If s1+s2 > 0.5 * (binsize + b) * r, then the total leakage (on both sides perhaps)
        // will be more than b.  I.e. too much slop.
        if (s1ps2sq > 0.25 * SQR(binsize + b) * rsq) return false;

I'd have thought that this should be if (s1ps2sq > 0.25 * SQR(binsize + 2.*b) * rsq) return false, since the test is asking if a shift up or down by s1+s2 takes us out of the (slopped) bin or not, which is guaranteed if the shift is at least half an effective bin width, the effective bin width being binsize + 2.*b. Such a change would be consistent with the next test on line 120

        double f = std::min(frackk, 1.-frackk);
        if (s1ps2sq > SQR(f*binsize + b) * rsq) return false;

where if the separation is exactly in the center of the bin then f=0.5 and this is the same test as before.

Do you agree? I may have got myself completely in a muddle here.

rmjarvis commented 1 year ago

I agree. The first test is too conservative. Thanks for spotting that.

rmjarvis commented 1 year ago

Actually, I looked into it a bit more, and I think the code is as intended, but the description is slightly wrong. For any particular pair of cells being considered, we only allow a total of bin_slop to fall outside the bin in either direction. That's what the first test checks.

By the time we get to the second test, we know that the cases where we need to keep splitting will only slop either up or down, but not both. So the test can assume that all of the slop is just in the one direction.

So overall, we do have a total effective bin width that extends +- bin_slop on each side. But the accumulation never includes a set of pairs that have more than one bin_slop worth at a time.

I'll add some text to the bin_slop description in the docs to explain this better.

rmjarvis commented 1 year ago

cf. 4e7004a5548d3134

ahallcosmo commented 1 year ago

I see, I agree that the tests match up with this explanation. I was looking at this because I was trying to model the correlation function with non-zero bin_slop using trapezoidal bins in log-space, as per the docs. But doesn't this first test complicate that a little? For example, if the cell-pair separation is exactly at the bin center, and a shift of s1+s2 in either direction slops over the edge by more than b/2 but less than b then this pair will be split again. But a trapezoidal bin with base length 2b+binsize would potentially accept this. Do you expect the trapezoidal model to be only an approximation?

rmjarvis commented 1 year ago

The trapezoid model is definitely an approximation at best. The exact bin shape depends a lot on the distribution of pair distances across the bin boundary, since that affects the probability that a given galaxy pair would show up somewhere in a pair of cells that would place it in the wrong bin.

It's not at all clear to me which choice here is closer to producing an exact linear slope for a bin edge. I think both of them give edges that have the full height at the inside edge of the slop, and zero height on the outside edge. But probably neither of them are linear in between if I had to guess.

If you want to investigate this more precisely, you can sample pairs for a particular bin and see what the distribution of separations is. See https://rmjarvis.github.io/TreeCorr/_build/html/correlation2.html#treecorr.BinnedCorr2.sample_pairs

ahallcosmo commented 1 year ago

Agreed, thanks for clarifying this. Happy for this to be closed now.

rmjarvis commented 1 year ago

OK. Thanks for raising the issue. It was definitely helpful for me too to have a deeper think about this.