Open jlmelville opened 1 year ago
No, I think you are correct. This should probably be fine for large indices, but when you get low in the tree perhaps it could actually come into play? Thanks for pointing this out; I'll have to see if I can run some small experiments and see if it is worth trying to fix this up.
In
euclidean_random_projection_split
, this part of the code is picking two random points to form the hyperplane:https://github.com/lmcinnes/pynndescent/blob/db258cea34cce7e11e90a460c1f8a0bd8b69f1c1/pynndescent/rp_trees.py#L221-L224
right_index
has an ever so slight bias to being chosen as 0, because whenleft_index == indices[-1]
andright_index
is also sampled asindices[-1]
, it will be "overflowed" to 0.If
right_index
was selected as:then the bias is removed and there is no need for the
right_index = right_index % indices.shape[0]
-- I think?This also affects the angular and sparse variants, but I assume this doesn't really matter. I am mainly asking to make sure I didn't miss something.