motiwari / BanditPAM

BanditPAM C++ implementation and Python package
MIT License
647 stars 38 forks source link

Modify naive algorithm to PAM. Fixes #1 #88

Closed mailology closed 3 years ago

mailology commented 3 years ago

I have modified the naive algorithm to PAM. The logic of swap_naive is changed so that the best and second best distance of each datapoint are cached and used in the computing the cost. When we swap medoid m_k with non-medoid x_i, the cost for datapoint x_j is computed as

  1. if x_j is assigned to medoid m_k: cost = min(dist(x_i, x_j), second best distance of j)
  2. if x_j is not assigned to medoid m_k: cost = min(dist(x_i, x_j), best distance of j)

Hence, the complexity of PAM is O(kn^2) . The algorithms are tested on the MNIST-1k data as follows:

Original naive algorithm

Screen Shot 2021-07-24 at 12 07 09 PM

PAM algorithm

Screen Shot 2021-07-24 at 12 09 13 PM
mailology commented 3 years ago

I have fixed the following:

PAM algorithm

Screen Shot 2021-07-30 at 3 07 11 PM

naive algorithm

Screen Shot 2021-07-30 at 4 08 37 PM