I have been using pykdtree to obtain nearest neighbours and it seems that it breaks down for a very large dataset. I managed to reproduce the problem with the following example:
from pykdtree.kdtree import KDTree
import numpy as np
pos = np.random.rand(int(5e8),3)
nb = 32
tree = KDTree(pos)
d, idx = tree.query(pos, k=nb)
h = d[:,nb-1]
print np.min(h)
The result of the previous code is that the minimum distance to the 32nd neighbouring particles to some particle is zero, which is incorrect and, indeed very unlikely. It turns out that zero is assigned to many more than just one particle. I fact, it is zero for a large fraction of the particles. Doing
import numpy as np
k, = np.where(h == 0)
print(len(k))
returns 365782272. I.e., it is 0 for ~73 % of the whole sample. This is clearly the wrong answer.
I discovered the problem when using py-sphviewer, which relies on pyktree to find the smoothing length of particles in cosmological simulations. When the number of particles within the simulated volumes is very large (several hundreds of millions), pykdtree assigns a wrong distance of 0 between individual particles and their 32nd neighbours.
Any idea on what might be causing this weird behaviour? I also checked with either single and double precision.
I have been using pykdtree to obtain nearest neighbours and it seems that it breaks down for a very large dataset. I managed to reproduce the problem with the following example:
The result of the previous code is that the minimum distance to the 32nd neighbouring particles to some particle is zero, which is incorrect and, indeed very unlikely. It turns out that zero is assigned to many more than just one particle. I fact, it is zero for a large fraction of the particles. Doing
returns 365782272. I.e., it is 0 for ~73 % of the whole sample. This is clearly the wrong answer.
I discovered the problem when using py-sphviewer, which relies on pyktree to find the smoothing length of particles in cosmological simulations. When the number of particles within the simulated volumes is very large (several hundreds of millions), pykdtree assigns a wrong distance of 0 between individual particles and their 32nd neighbours.
Any idea on what might be causing this weird behaviour? I also checked with either single and double precision.