Open TapdancingRodent opened 7 years ago
OK, I found test.py and noting that it runs fine I've managed to narrow down that the segfault occurs only when the dataset is very small (<50,000 entries for me)
hello! :-) can you maybe give the dataset, and complete reproducing instructions please? (i am just a by-passer, maybe the original dev(s) don't need that, but i do) .. alternatively, how to modify the test datasets to reproduce this would also work.
Hi, I modified the test script to run on less data and managed to get a segfault in the training and a floating point error in testing, see the code dumps below.
test_populate.py
import numpy as np
import scipy as sc
from covertree import CoverTree
from sklearn.neighbors import NearestNeighbors
gt = time.time
np.random.seed(seed=3)
print 'Building cover tree'
x = np.random.rand(10000,128)
with open('train_data.bin', 'wb') as f:
np.array(x.shape, dtype='int32').tofile(f)
x.tofile(f)
print x[0,0], x[0,1], x[1,0]
mx = np.mean(x,0)
dists = np.array([np.sqrt(np.dot(xv-mx,xv-mx)) for xv in x])
idx = np.argsort(-dists)
xs = x[idx]
#print sc.spatial.distance.squareform(sc.spatial.distance.pdist(x, 'euclidean'))
t = gt()
ct = CoverTree.from_matrix(x)
b_t = gt() - t
#ct.display()
print "Building time:", b_t, "seconds"
print "Test covering: ", ct.test_covering()
print 'Generate random points'
y = np.random.rand(5000,128)
with open('test_data.bin', 'wb') as f:
np.array(y.shape, dtype='int32').tofile(f)
y.tofile(f)
print 'Test Nearest Neighbour: '
t = gt()
a = ct.NearestNeighbour(y)
b_t = gt() - t
print "Query time:", b_t, "seconds"
nbrs = NearestNeighbors(n_neighbors=1, algorithm='brute').fit(xs)
distances, indices = nbrs.kneighbors(y)
b = np.squeeze(xs[indices])
if np.all(a==b):
print "Test for Nearest Neighbour passed"
else:
print "Test for Nearest Neighbour failed"
print 'Test k-Nearest Neighbours (k=2): '
t = gt()
a = ct.kNearestNeighbours(y,2)
b_t = gt() - t
print "Query time:", b_t, "seconds"
nbrs = NearestNeighbors(n_neighbors=2, algorithm='brute').fit(xs)
distances, indices = nbrs.kneighbors(y)
if np.all(a==xs[indices]):
print "Test for k-Nearest Neighbours passed"
else:
print "Test for k-Nearest Neighbours failed"
print 'Test delete: '
x2 = np.vstack((xs[:indices[0,0]], xs[indices[0,0]+1:]))
dels = ct.remove(xs[indices[0,0]])
print 'Point deleted: ', dels
a = ct.NearestNeighbour(y)
nbrs = NearestNeighbors(n_neighbors=1, algorithm='brute').fit(x2)
distances, indices = nbrs.kneighbors(y)
b = np.squeeze(x2[indices])
if np.all(a==b):
print "Test for delete passed"
else:
print "Test for delete failed"
test_search.py
import numpy as np
import scipy as sc
from covertree import CoverTree
from sklearn.neighbors import NearestNeighbors
gt = time.time
np.random.seed(seed=3)
print 'Building cover tree'
x = np.random.rand(500000,128)
with open('train_data.bin', 'wb') as f:
np.array(x.shape, dtype='int32').tofile(f)
x.tofile(f)
print x[0,0], x[0,1], x[1,0]
mx = np.mean(x,0)
dists = np.array([np.sqrt(np.dot(xv-mx,xv-mx)) for xv in x])
idx = np.argsort(-dists)
xs = x[idx]
#print sc.spatial.distance.squareform(sc.spatial.distance.pdist(x, 'euclidean'))
t = gt()
ct = CoverTree.from_matrix(x)
b_t = gt() - t
#ct.display()
print "Building time:", b_t, "seconds"
print "Test covering: ", ct.test_covering()
print 'Generate random points'
y = np.random.rand(2,128)
with open('test_data.bin', 'wb') as f:
np.array(y.shape, dtype='int32').tofile(f)
y.tofile(f)
print 'Test Nearest Neighbour: '
t = gt()
a = ct.NearestNeighbour(y)
b_t = gt() - t
print "Query time:", b_t, "seconds"
nbrs = NearestNeighbors(n_neighbors=1, algorithm='brute').fit(xs)
distances, indices = nbrs.kneighbors(y)
b = np.squeeze(xs[indices])
if np.all(a==b):
print "Test for Nearest Neighbour passed"
else:
print "Test for Nearest Neighbour failed"
print 'Test k-Nearest Neighbours (k=2): '
t = gt()
a = ct.kNearestNeighbours(y,2)
b_t = gt() - t
print "Query time:", b_t, "seconds"
nbrs = NearestNeighbors(n_neighbors=2, algorithm='brute').fit(xs)
distances, indices = nbrs.kneighbors(y)
if np.all(a==xs[indices]):
print "Test for k-Nearest Neighbours passed"
else:
print "Test for k-Nearest Neighbours failed"
print 'Test delete: '
x2 = np.vstack((xs[:indices[0,0]], xs[indices[0,0]+1:]))
dels = ct.remove(xs[indices[0,0]])
print 'Point deleted: ', dels
a = ct.NearestNeighbour(y)
nbrs = NearestNeighbors(n_neighbors=1, algorithm='brute').fit(x2)
distances, indices = nbrs.kneighbors(y)
b = np.squeeze(x2[indices])
if np.all(a==b):
print "Test for delete passed"
else:
print "Test for delete failed"
aehm...
i am really no python dev, so.. could you please give me instruction how to run these? ;-)
i put these files in my CoverTree/ folder, next to test.py
, and run like this:
python test_populate.py
and then i get:
Traceback (most recent call last):
File "test_populate.py", line 3, in <module>
from covertree import CoverTree
File "/home/hoijui/src/CoverTree/covertree/__init__.py", line 1, in <module>
from covertree import CoverTree
File "/home/hoijui/src/CoverTree/covertree/covertree.py", line 1, in <module>
import covertreec
ImportError: No module named covertreec
i guess it is somehow related to setup.py
, but i don't know how to use that either.
Hey, you're right - you don't have the python bindings set up for your system, you just need to run the below:
python setup.py install
p.s. if you've got a default Python install, you'll also need to install the numpy, scipy and sklearn modules - I'd recommend using the pip Python package manager which is under python-pip if you're using aptitutde.
thank you! :-)
i was able to fix the SIGSEGV of test_populate.py
, and the SIGFPE of test_search.py
, but am stuck in a SIGSEGV caused later by test_search.py
the fixes are here (together with some other, minor changes of mine):
https://github.com/hoijui/CoverTree
how to debug this stuff: hints from here: http://www.boost.org/doc/libs/1_63_0/libs/python/doc/html/faq/how_do_i_debug_my_python_extensi.html
concretely i did:
ddd # this is a GUI frontend for gdb
# inside ddd ...
target exec python
run test_search.py
the problem i have with the remaining SIGSEGV, is that the backtrace shows nothing in this libraries code, which means we have to use valgrind to debug that one, which is kind of a pain .. not today anymore! ;-)
.. i did so, and found the remaining bug (in many places in the code). so both your examples should now work; fixes are here: https://github.com/hoijui/CoverTree
Hey, Great work and thanks for digging out the bug. I should be getting back into this code in a month or two when I'll switch over to your code base.
In the mean time I'll leave the issue open here in case Manzil continues developing this repo in the future.
I'm getting a segfault when using the Python API (very grateful for that addition by the way =D), am I using it correctly? The API reference PDF doesn't list the constructor. I'm using gcc/g++7.1.0 and Python 2.7, see the gdb trace below.