gregversteeg / corex_topic

Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx
Apache License 2.0
627 stars 120 forks source link

Allow anchoring parameter to be set more flexibly #16

Open ryanjgallagher opened 6 years ago

ryanjgallagher commented 6 years ago

Currently, the anchoring parameter must be set to be the same across all words that are anchored. Since the theory allows it, we should allow a user to pass a list of anchors (if they want) where they the list can consist of integers (anchor all words in this topic with the same parameter), lists (anchor the words in this particular topic with these anchors), or both (a mix of setting the anchor to be the same for all words in some topics, and setting the parameter for each word in some topics). A user should still be allowed to just pass an integer to the anchoring parameter if they do not want to specify each topic.

ex.

anchors = [['dog', 'cat'], 'apple']
anchor_strengths = [[2, 3], 4]
topic_model.fit(X, words=words, anchors=anchors, anchor_strength=anchor_strengths)

This would anchor "dog" to Topic 1 with anchor_strength=2, "cat" to Topic 1 with anchor_strength=3, and "apple" to Topic 2 with anchor_strength=4.

Opening this as an issue because I keep forgetting to get around to it.