Open d5e opened 9 years ago
Hey,
Sorry, I'm not maintaining this library a lot those days. I never experimented with this amount of features and have no idea of what can hide behind this issue. It might be interesting if you have a 'little' example with enough data to reproduce this issue.
Maybe someone in the community might be interested in digging into that from there.
As time went by, a comprehensive machine learning library called Rumale was developed using NArray. This library provides a number of algorithms.
https://github.com/yoshoku/rumale
This library is no longer maintained and has some issues, so you may want to use Rumale instead.
I know you introduced with an estimation of 10 features and more bruteforce being more perfomant than KNN. I can tell that the number in fact is much higher as of in KNN you only need to build the tree once and for every nearest neighbour query you only need to traverse the tree to calculate the distance. With the brute force approach you need to calculate the distance for every pair of objects you want to measure which sets knn in favour to brute force if you have large sets of objects.
The problem I have experienced is, when I have more than 10 features, and I am requesting results with a limit of, say 30 or 500, than I am getting only 5 results, no matter which limit I set.
Once I supply at least 669 objects I can query any limit, as long as the limit is equal or greater than 669.
That is what I have observed so far. I am not sure if you are still working on this, but maybe you have an idea what the source of that weird behaviour could be. In case you would like to have more information about our test scenario, I am happy to answer those.