rmsalinas / fbow

FBOW (Fast Bag of Words) is an extremmely optimized version of the DBow2/DBow3 libraries.
570 stars 141 forks source link

cluster center initialization #22

Open jbfuehrer opened 5 years ago

jbfuehrer commented 5 years ago

Hey,

is there a reason why the mechanics for determining the cluster centers changed from the kmpp algorithm used inside DBoW2 to the version now used in fbow?

I noticed that especially with smaller vocabularies, sometimes the exact same feature is chosen multiple times as the initial cluster center which results in one of them always being empty (because all features fall into the one being found first during linear search) and therefore generating unused/meaningless words.

I ported the DBoW2 KMPP implementation over to fbow and can do a PR. Just wanted to make sure I'm not missing any domain knowledge before doing so.

Greets

dukeNashor commented 5 years ago

Same thoughts here. The new initial-cluster-center-choosing-algorithm doesn't make sense to me, either.

S-o-T commented 5 years ago

@rmsalinas Can you please comment on this? Any plans to fix the issue? @jbfuehrer Can you please commit your impl to your fork repo at least, will be much appreciated.

jbfuehrer commented 5 years ago

@S-o-T done, also created a PR now.

S-o-T commented 5 years ago

@jbfuehrer Thank you.