vi3k6i5 / GuidedLDA

semi supervised guided topic model with custom guidedLDA
Mozilla Public License 2.0
497 stars 108 forks source link

Keyerror for words not in vocabulary #54

Open appledora opened 4 years ago

appledora commented 4 years ago

I am using a list of list for the seed words which contains some words that may not be present in the dataset. I am getting a one line "Keyerror : "পেশী" for this. I tried using try-catch in the following manner: `for t_id, st in enumerate(seed_topic_list):

for word in st:

    try:

        seed_topics[word2id_[word]] = t_id

    except KeyError:

        print ("not in vocabulary")

        seed_topics[word2id_[word]]  = 0`

still not working. So , should I just train my Countvectorizer with the seed words?

sonamgupta1105 commented 3 years ago

@vi3k6i5 I am having the similar KeyError. Any suggestions how to fix it?