Sorry to bother you again. I've collected a few more French texts from Project Gutenberg and I'm trying to generate another model. I receive these errors:
...
2016-12-11 12:34:28,503 : INFO : collecting all words and their counts
2016-12-11 12:34:28,503 : INFO : collected 0 word types from a corpus of 0 raw words and 0 sentences
etc.
RuntimeError: you must first build vocabulary before training the model
The utf-8 text files are in 'processed_texts' folder (sourcedir = 'processed_texts') in parent Word-Vector-Inflector folder. Checking stackoverflow posts only explains: "default min_count in gensim's Word2Vec is set to 5. If there is no word in your vocab with frequency greater than 4, your vocab will be empty and hence the error."
I don't think that's the problem since most words would occur at the minimum count or higher?
Should the folder be in a different location?
Sorry to bother you again. I've collected a few more French texts from Project Gutenberg and I'm trying to generate another model. I receive these errors: ... 2016-12-11 12:34:28,503 : INFO : collecting all words and their counts 2016-12-11 12:34:28,503 : INFO : collected 0 word types from a corpus of 0 raw words and 0 sentences etc. RuntimeError: you must first build vocabulary before training the model
The utf-8 text files are in 'processed_texts' folder (sourcedir = 'processed_texts') in parent Word-Vector-Inflector folder. Checking stackoverflow posts only explains: "default min_count in gensim's Word2Vec is set to 5. If there is no word in your vocab with frequency greater than 4, your vocab will be empty and hence the error."
I don't think that's the problem since most words would occur at the minimum count or higher? Should the folder be in a different location?
Appreciate any suggestions.
Thanks