manodeep / Corrfunc

⚡️⚡️⚡️Blazing fast correlation functions on the CPU.
https://corrfunc.readthedocs.io
MIT License
164 stars 50 forks source link

Severe bug in wtheta #111

Closed manodeep closed 7 years ago

manodeep commented 7 years ago

The angular separation between two points is

cos (dtheta) = x1*x2 + y1*y2 + z1*z2

However, for clustering we only care about the absolute value and not the sign. Thus, the computation should have either been for cos^2(dtheta) or equivalently,ABS(x1*x2 + y1*y2 + z1*z2).

ALL calculations that can produce negative dot products for valid pairs are affected

(I think) since the regular tests (and external code validation) were for the SDSS North footprint, this bug was not uncovered. And only when the correct gridding + offset conditions were implemented (in PR #108) did this bug come to front.

manodeep commented 7 years ago

(Most likely) the severe bug only appears when separations >= 90 deg are requested. cos(-theta) = cos(theta) => in the [-90, 90] range, the cos term should be positive.

manodeep commented 7 years ago

The hold up with the bugfix has been the validation against an external code. However, that's now sorted -- and on the plus side, the new wtheta version will be able to calculate angular separations up to 180 deg (only in bruteforce mode).

On the plus side, the code previously only allowed thetamax < 90 deg, so the bug was likely never triggered (or had a very very small impact).

manodeep commented 7 years ago

This is a pesky bug - the validation against external code seems to depend on the thetamax -- the maximum separation used. For thetamax=10, the number of pairs in brute-force/dec-linking found is off by 2 (for an auto-corr, so really off by 1). The number of pairs is off by quite a bit more for ra+dec linking.

(I think the wtheta tests need to switched to a lower maximum angle, say 5-10 deg.)

manodeep commented 7 years ago

Okay finally some progress. Looks like the culprit is ra_refine_factor - for ra_refine = 1 and dec_refine_factor=1, the pair counts for ra+dec linking agree with dec-linking as well as bruteforce. (This requires a fudge-factor for the computed binsize in form of binsize_expand_factor := 1.1. I suspect that a lot of these numerical precision issues stem from binning in declination rather than cos(declination))

However, the pair counts start diverging for increasing ra_refine_factor; and notably, pair counts diverge more for increasing ra_refine_factor compared to increasing dec_refine_factor. (Sidenote: options->bin_refine_factors[0] == ra_refine_factor and options->bin_refine_factors[1] == dec_refine_factor; based on the principle that ra always comes before dec within Corrfunc)

A hack is simply to add/subtract the ra_refine_factor to the max_ra_this_dec/min_ra_this_dec variables in the function assign_ngb_cells_index_ra_dec_wtheta_DOUBLE and a for loop check over first->ngb_cells to make sure that a cell-pair is never duplicated.

Obviously, such a hack will never fly in the paper - so I still have to sort out ra/dec linking. Easiest (i.e., fastest to implement) solution to disable bin_refine_factors for w(theta).

manodeep commented 7 years ago

@lgarrison - Solved!

(The trick that you recommended) The trick was to see which cell-pairs produced valid pairs within range -- all of incorrect cell-pairs involved the first or the last ra cell. I figured the fix was to use the actual extent of the particles within the cell rather than the cell limits from binsize. This is because the cells actually run over the exact spatial extent of the data and these two conditions ensure any points exactly on the edge are assigned to the cell:

    int idec = (int)(ngrid_dec*(dec[i]-dec_min)*inv_dec_diff);
    if(idec >= ngrid_dec) idec--;
    int ira  = (int)(ngrid_ra[idec]*(ra[i]-ra_min)*inv_ra_diff);
    if(ira >=ngrid_ra[idec]) ira--;

Once I replaced the lattice->(dec_min/dec_max) and lattice->(ra_min/ra_max)

from

lattice[index].dec_min = dec_min + idec*dec_binsize;
lattice[index].dec_max = dec_min + (idec + 1)*dec_binsize;
lattice[index].ra_min = ra_min + ra_binsize_this_dec*ira;
lattice[index].ra_max = ra_min + ra_binsize_this_dec*(ira + 1);

to

lattice[index].dec_min = dec[i] < lattice[index].dec_min ? dec[i]:lattice[index].dec_min;
lattice[index].dec_max = dec[i] > lattice[index].dec_max ? dec[i]:lattice[index].dec_max;
lattice[index].ra_min = ra[i] < lattice[index].ra_min ? ra[i]:lattice[index].ra_min;
lattice[index].ra_max = ra[i] > lattice[index].ra_max ? ra[i]:lattice[index].ra_max;

the last remaining issue was solved.