kLabUM / rrcf

🌲 Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams
https://klabum.github.io/rrcf/
MIT License
495 stars 112 forks source link

Duplicates seem to break batch instantiation #31

Closed mdbartos closed 5 years ago

mdbartos commented 5 years ago

Traceback:

    157         else:
    158             # Create a leaf node from isolated point
--> 159             i = np.asscalar(np.flatnonzero(S2))
    160             leaf = Leaf(i=i, d=depth, u=branch, x=X[i, :], n=N[i])
    161             # Link leaf node to parent

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/numpy/lib/type_check.py in asscalar(a)
    487 
    488     """
--> 489     return a.item()
    490 
    491 #-----------------------------------------------------------------------------

ValueError: can only convert an array of size 1 to a Python scalar
mdbartos commented 5 years ago

Seems to happen when np.unique doesn't group near-duplicates as duplicates:

U, N = np.unique(xy, axis=0, return_counts=True)
U[[1704, 1705]]
>>> array([[43.1,  2. , 42.4],
           [43.1,  2. , 42.4]])
U[1704, 2], U[1705, 2]
>>> 42.399999999999864 42.399999999999906
mdbartos commented 5 years ago

Fixed in #32