gregversteeg / corex_topic

Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx
Apache License 2.0
626 stars 119 forks source link

Does corex have a predict function? #54

Open RMZ3 opened 2 years ago

RMZ3 commented 2 years ago

Hello,

I have trained a CorEx model on a set of documents, but I now have new documents I want to infer topics for using the prior model. Is there a way to do this using CorEx?

gregversteeg commented 2 years ago

Yes, there is a predict function. You have to use the python API, though, I don't think there's a way to do it from the command line interface.

RMZ3 commented 2 years ago

Thank you. Is there any example code on how to use the predict function?

GiarteDataTeam commented 1 year ago

You can find some samples here

This also may help:

    doc_word = vector.transform(doc)
    words = list(np.asarray(vector.get_feature_names()))
    # final step of preprocessing where we remove all integers from our set of words.
    not_digit_inds = [ind for ind, word in enumerate(words) if not word.isdigit()]
    doc_word = doc_word[:, not_digit_inds]
    topics = corex_model.predict(doc_word)

where vectoris defined during training and developing the model

vector = CountVectorizer(stop_words='english',  max_features=max_words, lowercase=True, ngram_range=ngram,
                                 binary=True)