Jackknifing - graceful exit when there is one particle, or particles only at one position

tmcclintock commented 6 years ago

Corrfunc is used by the 2pcf_jk package, written by @rafaelgm9, to jackknife two point correlation functions. I am computing the autocorrelation between halos in a snapshot at high redshift, where the halos are very rare. When a spatial jackknife region happens to have no halos, Corrfunc dies with the following error message:

Error in file: ../../utils/gridlink_impl_double.c func: gridlink_index_particles_double line: 403 with expression `ix >= 0 && ix < nmesh_x' ix=-2147483648 must be within [0,1) ESC[34mHopefully, input validation. Otherwise, bug in code: please email Manodeep Sinha manodeep@gmail.com

Corrfunc should by default recognize that it there are no objects in the input catalog, and return 0 for the number of pairs. Or, at least, throw an error that can be caught higher up. As it stands now, I am not sure how to circumvent this error in Python with a try/catch block.

Thanks, and if this can, in fact, be caught and worked around then I can work with that.

PS I understand that I can reduce the number of jackknife regions, but I want to avoid that since I am trying to estimate a covariance matrix

lgarrison commented 6 years ago

I can't seem to reproduce this problem. Here's my attempt:

import numpy as np
from Corrfunc.theory.DD import DD
box = 10.
X, Y, Z = np.random.rand(3,0)*box

bins = np.linspace(0.1, 1., 10)

DD(1, 1, bins, X, Y, Z, X2=X, Y2=Y, Z2=Z, boxsize=box, periodic=False)

array([(0.1, 0.2, 0., 0, 0.), (0.2, 0.3, 0., 0, 0.),
       (0.3, 0.4, 0., 0, 0.), (0.4, 0.5, 0., 0, 0.),
       (0.5, 0.6, 0., 0, 0.), (0.6, 0.7, 0., 0, 0.),
       (0.7, 0.8, 0., 0, 0.), (0.8, 0.9, 0., 0, 0.),
       (0.9, 1. , 0., 0, 0.)],
      dtype=[('rmin', '<f8'), ('rmax', '<f8'), ('ravg', '<f8'), ('npairs', '<u8'), ('weightavg', '<f8')])

Seems to be handled gracefully. I tried a few variations of the above but couldn't get it to crash.

Also, glancing at the 2pcf_jk code, they seem to be protecting against this case: https://github.com/rafaelgm9/2pcf_jk/blob/master/auto_correlation.py#L93

@tmcclintock, could you try to narrow down your failure case to a minimal example that demonstrates the problem?

Also, @manodeep, we should think about adding a jackknife mode to Corrfunc. We could do it via particle tags or via fixed subvolumes, but the main change would be to add a dimension to the internal pair counting histogram so that multiple correlation functions could be returned. This would surely be many times faster than looping over jackknife regions and invoking Corrfunc separately on each. What do you think?

tmcclintock commented 6 years ago

Thanks @lgarrison I'll try to reproduce this in a clean way. Perhaps I misdiagnosed the problem. Until I do properly figure out the cause I'll just close the issue.

Also, w.r.t. implementing jackknifing, you should ask @rafaelgm9 if you can merge his routine. It is very clean, doesn't need random catalogs, and doesn't require the use of tags.

manodeep commented 6 years ago

@tmcclintock I am looking at the source where this error message is generated, and given the checks right before the error message line, the behaviour you encountered seems very strange.

What I mean is that all x, y, z values are validated to be within [{xyz}min, {xyz}max]; therefore, the i{xyz} values should really be between [0, n{xyz}). Would it be possible to print out the values of {xyz} that cause the crash?

tmcclintock commented 6 years ago

Hey @manodeep. I think that the error might be in the 2ptcf_jk package, since I have confirmed that I can run DD() on the catalog itself without issue. I'll post an issue on that repository and see if we can get it figured out over there.

manodeep commented 6 years ago

@tmcclintock Thanks for getting back so quickly. I understand that the original crash issue is likely in the other package, but I feel there is also something else happening that Corrfunc should handle better. Please tag me over on the other repo - thanks!

tmcclintock commented 6 years ago

Hey again all. I managed to reproduce the error. Here is the script:

import Corrfunc
import numpy as np

autocorr = 0
nthreads = 1
bins = [.1,10,100]
xd1 = xd2 = [0.5]
yd1 = yd2 = [0.5]
zd1 = zd2 = [0.5]

DD_counts = Corrfunc.theory.DD(autocorr, nthreads, bins, xd1, yd1, zd1, X2=xd2, Y2=yd2, Z2=zd2, periodic=False, verbose=False)

As you can see, the error occurs when there is a single object being referred to in a cross-correlation (autocorr=0). If instead I write:

xd1 = xd2 = [0.5, 1.0]
yd1 = yd2 = [0.5, 1.0]
zd1 = zd2 = [0.5, 1.0]

then the error does not occur, meaning it is specific to the case where there is only one object being inspected.

manodeep commented 6 years ago

Ahh of course. There is no width to the spatial domain when only 1 particle is used. (Would be the same case if N particles were passed with identical positions)

Setting a minimum width would solve the issue; but a better solution would be to brute-force for small particle numbers

manodeep / Corrfunc

Jackknifing - graceful exit when there is one particle, or particles only at one position #155