nlpub / chinese-whispers

An implementation of Chinese Whispers in Python.
https://chinese-whispers.readthedocs.io
MIT License
58 stars 13 forks source link

Is there any way to specify the number of labels and the number of members under each label #38

Closed jiangweiatgithub closed 2 years ago

jiangweiatgithub commented 2 years ago

Hi! I have been giving the python code on https://github.com/nlpub/chinese-whispers-python a test. I was just wondering if there is any way to specify the total number of labels, and the number of members under each label, or roughly so. Say, I have 1000 members, and I want to cluster them into 200 groups, allowing 1 to 10 member in each group, 1 meaning that member is a group by himself.

Thanks, Wei

dustalov commented 2 years ago

Hi,

the original version of the Chinese Whispers algorithm does not put any constraints on the number of clusters or numbers of elements in cluster. Some algorithms, like k spanning tree or spectral clustering, require specifying the number of clusters.

It is possible, however, to satisfy both your requirements by modifying our code. In particular, during initialization you can assign not |V| labels, but N labels, and on label scoring you can penalize the label by the number of elements with it.

Hope it helps.

jiangweiatgithub commented 2 years ago

That sounds interesting and promising, but can you pin-point the relevant parts of the code - I cannot really find them.

TIA!

dustalov commented 2 years ago