Open zhq2009 opened 8 years ago
what language are you trying? can you paste the command you are running?
Hello,
We are trying English wikipedia. The command we are running is sudo sh prepare.sh en_US /mnt/data/, actually prepare.sh runs everything, such as downloads files and compiles programs. We are wondering if we could get the executable programs directly. We were also experiencing compatibility problems and the generated corpus file is empty.
Thank you very much
Hello,
We run the commands in prepare.sh manually and we get the corpus file successfully. We are currently train model using the corpus file, the massage we got from the command:
... Requirement already satisfied (use --upgrade to upgrade): requests in /usr/lib/python2.7/dist-packages (from smart-open>=1.2.1->gensim) Cleaning up... pid 13182's current affinity mask: ff pid 13182's new affinity mask: ff
and the program stays there for several hours, but the CPU usage is full.
We are wondering whether the program is running correctly and shall we wait until we get the results?
Thank you very much
ZH, depending on the corpus size + number of dimensions, method(skipgram, cbow) it can take a long time, usually for the settings of the shared models it took around 4,5 hours. my advice is to let it run a few hours (at least 6).
Be aware that if you installed gensim manually, it might not be using all the cores. The script provided in this repo installs it such that it uses as many cores as possible.
The first stage of word2vec will only use a single core tho (gathering the vocabulary), the batches of matrix factorization are done in parallel using as many cores as possible.
Hello,
We use the command "wiki2vec.sh corpus output/model.w2c 50 500 10" to generate model file, after program runs for 20 hours, we get error message "IOError: [Errno 2] No such file or directory: '/home/_/_/wiki2vec/wiki2vec-master/results/model.w2c.syn1neg.npy'".
Could you please give us some suggestions about how to solve the problem?
Thank you very much.
Hi, @zhq2009 was this issue ever resolved?
Hello,
Yes, the problem was solved.
Thank you very much.
On Sat, Dec 31, 2016 at 4:14 AM, Rishab Gargeya notifications@github.com wrote:
Hi, @zhq2009 https://github.com/zhq2009 was this issue ever resolved?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/idio/wiki2vec/issues/23#issuecomment-269856811, or mute the thread https://github.com/notifications/unsubscribe-auth/ARKSjKEJNfE-sArEe0yNTJCe3iLEUqH1ks5rNhzfgaJpZM4JWUET .
Hi, I'm having the same problem when I try to generate the Corpus file - the file keeps coming up empty. I'm running the following command:
sudo sh prepare.sh en_US ~/data
Do you know why this might be?
Thank you!
Hi, I am also facing the same issue.
When I ran the following snippet from gensim.models import Word2Vec model = Word2Vec.load("path/to/word2vec/en.model") model.similarity('woman', 'man'), I got the following error
" array.shape = shape ValueError: cannot reshape array of size 108 into shape (1151090,1000)"
Next when I run "sudo sh prepare.sh en_US ~/data", the corpus file is empty. Could that be related, and if not how to solve these 2 issues?
Hello,
We are using prepare.sh to generate Corpus file, but the Corpus file we generate is empty, could you please give us some suggestion of how to solve the problem?
Thank you very much