Open manish7294 opened 6 years ago
@iglesias @manish7294 has some question about LMNN implementation in shogun :)
@iglesias Hi! Hope you are doing good :) I was doing benchmarking of LMNN on both Mlpack and Shogun. I am using shogun's KNN for measuring the accuracy metric on transformed data. But here comes the problem, if I used query set same as the training set then the KNN consider the query point itself as the first nearest neighbor(giving 100% accuracy for k =1) but it's not same as what Mlpack's KNN do, Mlpack's KNN avoid the situation of same point being the nearest neighbor and hence in this case Shogun's KNN second nearest neighbor is Mlpack's KNN first nearest neighbor. So, can you think of a way so that both the libraries are on the same page for benchmarking purpose? Or, Is there any way to avoid the point itself being consider as first nearest neighbor in Shogun. Any kind of help will be appreciated :)
Hi @manish7294. So this is about KNN.
What about the following: in mlpack use KNN with k=k'
and in Shogun KNN with k=k'+1
. Then you could compare the nearest neighbors given by mlpack with the 2nd to (k+1)th nearest neighbors given by Shogun.
Does this make sense to you?
@iglesias Thanks for looking into this. We thought about that but it would create problem in calculating predictions. As in case of Shogun's KNN prediciton, the point itself will always be counted as the positive nearest neighbor and will definitely give some amount of error in the measuring accuracy. Let me take an example: Let's say have 2 classes we want to have k = 2, so here we will be taking Mlpack's k as k = 2 and Shogun's k as k = 3. And suppose we got the first neighbor as of class 1 and second of class 2. So, ideally(as per what we do in Mlpack's KNN) we should be predicting point as of class 1. But here earlier we took Shogun's k as k as k = 3 and suppose in this case the point itself has label 2 as it will always be counted in the prediction), giving predicition as label 2 which would be different from Mlpack's KNN. On a second thought, If I assume that Shogun has a distance weighted KNN(Not sure) running behind the scenes, then the output will always be the labels of the point itself, no matter what k we choose.
Would it be possible to just get the k+1 nearest neighbors from Shogun itself? If that is the case, then we can do the classification ourselves (and ensure that we are always doing it the same way). If we can't get those from Shogun, maybe it is better to use a different technique for the nearest neighbor search. I don't think it makes such a huge difference either way, since we aren't timing the nearest neighbor search, we are just using it for finding the true nearest neighbors to calculate a metric.
Thanks all for helping out. Hopefully, the new custom KNN accuracy predictor will work as expected :)
No problem then, I will add it for sure :)
Everything looks good to me, but the build will fail until LMNN is part of the newest release of mlpack. So we can wait to merge until then, and we can be sure to release a newer version of mlpack soon-ish (before the end of the summer) with the LMNN support. For now, you can use the branch to run timing tests.
Do you think it would be useful to include mlpack-master as another library, that would allow us to merge this earlier.
@mlpack-jenkins test this please
@mlpack-jenkins test this please
@mlpack-jenkins test this please
Can one of the admins verify this patch?
Benchmarking scripts for Mlpack, Shogun & Matlab's LMNN.