mlberkeley / slang

2 stars 3 forks source link

Vocabulary expansion #4

Open ghost opened 7 years ago

ghost commented 7 years ago

Project GloVe vectors (https://nlp.stanford.edu/projects/glove/) onto word2vec vectors (models/w2v_100d.pickle)

  1. PCA/SVD on GloVe vectors (300d -> 100d)
  2. Learn linear function glove = W * word2vec (learn W)
  3. Get 100d projected GloVe vectors
rpandya922 commented 7 years ago

The GloVe dictionary I'm using is 400K words and word2vec is about 40K, so W would have to be 400K by 40K, which (using 64 bit floats) would take up over 128 gigabytes to store. So unless I'm missing something, I don't know how feasible this really is