kensho-technologies / bubs

Keras Implementation of Flair's Contextualized Embeddings
Apache License 2.0
26 stars 9 forks source link

Pooled Contextual Embedding #16

Closed Saichethan closed 3 years ago

Saichethan commented 4 years ago

Hello, how can I get PooledContextualEmbeddings as mentioned in http://alanakbik.github.io/papers/naacl2019_embeddings.pdf

ydovzhenko commented 4 years ago

You would need to create a data structure for keeping track of every unique word encountered in the text and its previous embeddings. This is done outside of the Keras/TF model in this repository. You could:

  1. modify the code for the InputEncoder class in https://github.com/kensho-technologies/bubs/blob/master/bubs/helpers.py to add memory
  2. implement a separate class outside of bubs.

If you go the former route, we are accepting pull requests:-)

Please refer to the beginning of Section 2 of the paper http://alanakbik.github.io/papers/naacl2019_embeddings.pdf. The embed() function they mention is equivalent to the example in https://github.com/kensho-technologies/bubs/blob/master/README.md - accepts text, outputs embeddings for each word. But the second part, the memory, is not implemented in bubs.

It requires an embed() function that produces a contextualized embedding for a given word in a sentence context (see Akbik et al. (2018)). It also requires a memory that records for each unique word all previous contextual embeddings, and a pool() operation to pool embedding vectors.