idio / wiki2vec

Generating Vectors for DBpedia Entities via Word2Vec and Wikipedia Dumps. Questions? https://gitter.im/idio-opensource/Lobby
601 stars 137 forks source link

DeepLearning4J unable to load `en.model` #37

Closed atTC2 closed 6 years ago

atTC2 commented 6 years ago

I'm trying to load en.model with deeplearning4j's Word2Vec implementation.

The following code is used:

return WordVectorSerializer.readWord2VecModel(new File("/home/tom/FYP/en_1000_no_stem/en.model"));

but unfortunately this exception is thrown:

java.lang.RuntimeException: Unable to guess input file format. Please use corresponding loader directly
    at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readWord2VecModel(WordVectorSerializer.java:2480)
    at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readWord2VecModel(WordVectorSerializer.java:2266)
    at xyz.tomclarke.fyp.nlp.word2vec.Word2VecProcessor.loadPreTrainedData(Word2VecProcessor.java:36)
    at xyz.tomclarke.fyp.nlp.word2vec.TestWord2Vec.testLoadWiki2Vec(TestWord2Vec.java:172)

Running your Python 'quick start' example works fine, so I'm unsure where the problem lies - either with me for not loading it correctly in some way or with DL4J (in which case I apologise for making an issue here).

Has this issue been seen before? Do you know if this is the correct way for you to load your data with dl4j's implementation? Thank you for any help.

dav009 commented 6 years ago

the model was created to be loaded via gensim.

Nonetheless you can simply dump the model, you can use https://github.com/idio/wiki2vec/blob/master/resources/gensim/convert_model.py

python convert_model.py /something/your/en.model

atTC2 commented 6 years ago

Thank you - it is now happy to load the model!

Now just to get the RAM to load it - do you have any recommendations on size?

dav009 commented 6 years ago

No idea about memory usage in DeepLearning4j

atTC2 commented 6 years ago

No worries, if I get it to work I'll add it to the thread in case it helps others (but can confirm it needs > 15GiB).

Thanks for the help!