lmcinnes / pynndescent

A Python nearest neighbor descent for approximate nearest neighbors
BSD 2-Clause "Simplified" License
897 stars 105 forks source link

Warn on RP forest degenerescence #230

Open hamelin opened 1 year ago

hamelin commented 1 year ago

Some datasets have such properties as to yield poor random projection trees, where branching does not divide the data very well. In such cases, one gets one-leaf branches and what remains is a large leaf monolith. One approach is to carry on dividing deeper, but the ill is done nonetheless, as these tiny branches can still cause problems. Thus, when the random projection recursion fails to yield a tree where all leaves satisfy the leaf size constraint, we will drop this tree. Should we drop all the trees, we will revert to random initialisation.