mbwolff / Word-Vector-Text-Modulator

A contribution to 2016 NaNoGenMo
0 stars 0 forks source link

gensimWord2Vec - not collecting words? #2

Closed GenTxt closed 7 years ago

GenTxt commented 7 years ago

Sorry to bother you again. I've collected a few more French texts from Project Gutenberg and I'm trying to generate another model. I receive these errors: ... 2016-12-11 12:34:28,503 : INFO : collecting all words and their counts 2016-12-11 12:34:28,503 : INFO : collected 0 word types from a corpus of 0 raw words and 0 sentences etc. RuntimeError: you must first build vocabulary before training the model

The utf-8 text files are in 'processed_texts' folder (sourcedir = 'processed_texts') in parent Word-Vector-Inflector folder. Checking stackoverflow posts only explains: "default min_count in gensim's Word2Vec is set to 5. If there is no word in your vocab with frequency greater than 4, your vocab will be empty and hence the error."

I don't think that's the problem since most words would occur at the minimum count or higher? Should the folder be in a different location?

Appreciate any suggestions.

Thanks