'Anchor word not in word column labels provided to CorEx:

gregversteeg / corex_topic

Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx

Apache License 2.0

627 stars 120 forks source link

'Anchor word not in word column labels provided to CorEx: #23

Closed srujana-tak closed 5 years ago

srujana-tak commented 5 years ago

How to skip if anchor words not in topic and still produce results for those words available

ryanjgallagher commented 5 years ago

Hi Srujana,

Would you be able to clarify what your question? Are you anchoring words to your topic model? And what words and results do you want to see results for?

srujana-tak commented 5 years ago

I am using anchor words like this topic_model.fit(X, words=words, anchors=[['dog','cat','animal'], ['home','interior', 'furniture'], ['beauty', 'cosmetic'], anchor_strength=3)]

If I want to fit this model (with same parameters) for a new data (X) and if it doesn't have a word 'cosmetic' then I am getting error 'Anchor word not in word column labels provided to CorEx:' I want same topics for new dataset but it is hard to change anchor words every time I fit the model

Is it that anchor words must be present in the data we provide?

ryanjgallagher commented 5 years ago

Yes, right now the code is structured such that the anchor words must be present in the data that is provided. This is to help alert the user that the words they are trying to anchor cannot be anchored.

@gregversteeg, do you think this should raise a warning instead? The error that's thrown is here, in preprocessing the anchors.

srujana-tak commented 5 years ago

Yes, a warning would do the job and still produce the results with words that can be anchored. Thanks!!

ryanjgallagher commented 5 years ago

Thanks for your patience @srujana-tak. I've made the update so that CorEx throws a warning instead of an error if the anchor is not in the vocabulary. Let me know if you have any further issues.

I've also updated CorEx on pip, so if you installed it via pip then you should be able to update it using pip install corextopic --upgrade

srujana-tak commented 5 years ago

Thank you!