Open jbfuehrer opened 5 years ago
Same thoughts here. The new initial-cluster-center-choosing-algorithm doesn't make sense to me, either.
@rmsalinas Can you please comment on this? Any plans to fix the issue? @jbfuehrer Can you please commit your impl to your fork repo at least, will be much appreciated.
@S-o-T done, also created a PR now.
@jbfuehrer Thank you.
Hey,
is there a reason why the mechanics for determining the cluster centers changed from the kmpp algorithm used inside DBoW2 to the version now used in fbow?
I noticed that especially with smaller vocabularies, sometimes the exact same feature is chosen multiple times as the initial cluster center which results in one of them always being empty (because all features fall into the one being found first during linear search) and therefore generating unused/meaningless words.
I ported the DBoW2 KMPP implementation over to fbow and can do a PR. Just wanted to make sure I'm not missing any domain knowledge before doing so.
Greets