Open DHAiRYA2048 opened 3 months ago
@DHAiRYA2048 ,
Thank you for sending the issue! Yeah, your point is very good. As for the KCenterGreedy, I just used Google's original implementation and did not apply any improvement.
Since each iteration requires access to the previous iteration's result, multiprocessing isn't helping.
Yes, you're right. But I think we can parallelize the update of distance metric after selecting a cluster center. To be specific,
Before (L.84 of knnsearch.py)
dist = sklearn.metrics.pairwise_distances(self.features, x, metric=self.metric)
After
dist = sklearn.metrics.pairwise_distances(self.features, x, metric=self.metric, j_jobs=4)
In the above code, I specified 4 as a number of parallelization, but you seem to have a lot of CPU cores, so you can increase the number. If you already specified the parameter of joblib on the other place, the above modification may have no effect, sorry...
The kCenterGreedy step takes ~180s on;
which is a lot of time for my use case. I am looking for ways to make this step faster (ideally, <75s). Since each iteration requires access to the previous iteration's result, multiprocessing isn't helping. It does make the speed go from 180s to 75s, but the result isn't desirable since it only takes into account distance of a single image.
Do you have any insights on how can this step be made faster?
`
`