Convert the tokens into word embeddings

jerpint / jag

MIT License

1 stars 0 forks source link

Convert the tokens into word embeddings #11

Open gmarceaucaron opened 5 years ago

gmarceaucaron commented 5 years ago

The tokens returned by Spacy should still be converted into distributed representations to be easily usable by a DL model. For the baseline, do we encode the question with the Universal Sentence Encoder and use a SOTA word embedding for the context? Which word embedding is still very popular? (Fastext, Glove, ?) Also, some researchers use Pair Byte Encoding (PBE) to encode information at the sub-word level.

jerpint commented 5 years ago

These are all good suggestions, I suggest we use a "simple" one to start and explore the use of different embeddings.

It may eventually be cool to train our own too:

https://www.tensorflow.org/guide/embedding