Open owlas opened 4 years ago
Is a
expected to be a list? I'm wondering if there is something missing in the preprocessing step for turning anchors
of words into indices.
Thanks for pointing this out. I think anchors has to be a list of lists: like anchors = [['cat', 'dog'], ['apple']]. Then "a" will be a list, and the slice will always produce an object that has axis=1! However! I note that the examples on the readme do not always match this intuition. I'll ask @ryanjgallagher to check if there is an issue with the readme.
topic_model.fit(X, words=words, anchors=[['dog','cat'], 'apple'], anchor_strength=2) Should be [['dog','cat'], ['apple']]?
topic_model.fit(X, words=words, anchors=['protest', 'protest', 'protest', 'riot', 'riot', 'riot'], anchor_strength=2) Should be [['protest'], ['protest']... ?
topic_model.fit(X, words=words, anchors=[['bernese', 'mountain', 'dog'], ['mountain', 'rocky', 'colorado'], anchor_strength=2) This one is missing a bracket I think.
The anchor input for the topic model should be a list. Within that list, the entries can be either strings, ints, or lists, and you should be able to do any combination of those for anchoring. Individual strings or ints (indicating you want to anchor only one word to a topic) are converted to lists with a single entry, and strings are converted to their corresponding column index.
So @gregversteeg isn't quite right, even if you pass just a string or int in the anchors list, it should be preprocessed properly. (He's right that the last example is missing a bracket though, my bad).
@owlas Can you provide a simple reproducible example? I can run the examples in the README without any issues using version 1.0.5, so I'm not sure what error you may be getting.
Can report the same for 1.0.6. Although I cannot reproduce the error consistently. For some sets of anchors it works, for some it throws an error (all of those should be valid according to the docs though)
@d-lowl Would you be able to provide a minimal reproducible example that does fail?
Yeah, I made it work for my case (it was partially a problem in my code), but I still think that there are some edge cases. I will play around with it tomorrow and try to come up with one.
Hi! I am not sure if this repo is still maintained but I ran into the same issue.
I found that single item anchor lists are transferred (back) into single items here: https://github.com/gregversteeg/corex_topic/blob/beea64bc41e62dffc5fb87deb506a3e253be0a6c/corextopic/corextopic.py#L367C27-L367C27
This would contradict @ryanjgallagher comment:
Individual strings or ints (indicating you want to anchor only one word to a topic) are converted to lists with a single entry
Removing the edge case for single item lists solved the problem for me:
if len(new_anchor_list) == 0:
continue
# if len(new_anchor_list) == 1:
# processed_anchors.append(new_anchor_list[0])
else:
processed_anchors.append(new_anchor_list)
When using the fit method with anchors I get an index error from this line:
https://github.com/gregversteeg/corex_topic/blob/83991482523a8be2b2a8b9c864b273de96c2389a/corextopic/corextopic.py#L185
The error is understandable because if
X
is a 2d array, thenX[:,i]
is a 1d slice and thereforeX[:,i].mean(axis=1)
is undefined because there is no dimension1
.I've installed version
corextopic==1.0.5
from pypi.I can reproduce this for any arguments passed to
anchors