okaneco / kmeans-colors

k-means clustering library and binary to find dominant colors in images
Apache License 2.0
132 stars 9 forks source link

Implement Hamerly's algorithm for Lab and Srgb #18

Closed okaneco closed 4 years ago

okaneco commented 4 years ago

Reformat README.md. Add logic for adaptive switching between Hamerly and naive in the binary.

k=6 seems to be the crossover point for colors when it's advantageous to use Hamerly over naive in the average case. This is not the case in the lanterns example: there are many similar colors, so distance calculations end up being performed and the overhead of the cached distance results likely slows down calculation. The flowers example has clear winnings probably attributable to cleaner segmentation of colors. Seconds can come off the calculation because of early convergence in the best case images skipping distance calculations for many points. The binary uses the optimization for k>5, the performance lost in some cases should be recovered by the next version of palette with optimization of converting between floats/ints and Rgb/Lab.

Implementation note: Square roots were necessary for the bounds calculation to result in the same calculation of colors as the naive version. The algorithms should be deterministic and result in the same colors. Tests should be added to verify this stays the case automatically instead of human-verifying.

Reference: Hamerly, G., & Drake, J. (2017). Chapter 2 Accelerating Lloyd's Algorithm for k-Means Clustering. Hamerly, G. (2010). Making k-means even faster. In: SIAM international conference on data mining.

Closes #1

Update: Second paragraph was true when kmeans initialization was not kmeans++.