Closed fengyuentau closed 2 years ago
Well, it turns out that doing clustering in Python with this pyclustering
is a nightmare and takes forever to finish. I turned back to the official matlab code for clustering all bboxes. Sad that matlab code is too hard to understand for me.
Thanks for your great work.
When I try to cluster the whole training set (159,424 bboxes), I found the speed for computing distances between every two bounding boxes is too slow, which requires ~40hrs to finish with a Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz (56 cores).
So I rewrite the corresponding code so as to take the advantage of Numpy's parallelism. Now it can finish in 9mins.
However, the clustering itself is still pretty slow. I do not hack into the pyclustering though, I have no idea how much time it will need to finish with all the bboxes.