idio / wiki2vec

Generating Vectors for DBpedia Entities via Word2Vec and Wikipedia Dumps. Questions? https://gitter.im/idio-opensource/Lobby
601 stars 137 forks source link

English pretrained embeddings of size 300 #40

Closed vhoulbreque closed 5 years ago

vhoulbreque commented 6 years ago

Would it be possible for you to share pretrained embeddings of size 300 in English?

I'm trying to train some models using these embeddings, but given the size of my dataset, 1000 dimensions is very high and my machine cannot support it.

vhoulbreque commented 6 years ago

I did reduce the number of dimensions using a PCA on the pretrained vectors of size 1000.

But if anyone has a link to pretrained embeddings of size 300, in English, feel free to share them! I'm still interested and I'm sure others will be too.

vhoulbreque commented 5 years ago

The PCA did work effectively in my case. I used the 3000 most common words of the English vocabulary to do it. Strange thing that I had to do that to solve my problem. My machine has 126 Gb of RAM and 24 cores and the 24 cores were used at 100%.