idio / wiki2vec

Generating Vectors for DBpedia Entities via Word2Vec and Wikipedia Dumps. Questions? https://gitter.im/idio-opensource/Lobby
601 stars 137 forks source link

Problems to run #2

Closed ansiiso closed 9 years ago

ansiiso commented 9 years ago
  1. Variable $BASE_WDIR is not defined. https://github.com/idio/wiki2vec/blob/master/prepare.sh#L32
  2. Got "MemoryError" while trying to build model and found this post: https://github.com/piskvorky/gensim/issues/293. Have you solved the issue? How large the memory it requires to run?
dav009 commented 9 years ago
  1. Are you running prepare.sh on ubuntu 14.04 ?
  2. Nope. You can get around it in two ways: (i) Get an instance with more RAM, (ii) increase min_count

A few more questions:

ansiiso commented 9 years ago
  1. Yes, I am using the script on ubuntu 14.04.
  2. I am processing English wiki with 2xlarge ec2 instance: 8 vCpu, 30 GB
dav009 commented 9 years ago
ansiiso commented 9 years ago
dav009 commented 9 years ago

If you are bounded to the 30G instance, You could try building a model with : 100 300 10. 300 dimensions, 100 min count.

dav009 commented 9 years ago

@ansiiso check the readme for instructions on how to get the english model via torrent. It is around 8G and there are few seeds so it should take a while to download.

dav009 commented 9 years ago

@ansiiso closing this issue as the memory limitation is due to gensim implementation / aws instance size.