Pooled Contextual Embedding

You would need to create a data structure for keeping track of every unique word encountered in the text and its previous embeddings. This is done outside of the Keras/TF model in this repository. You could:

modify the code for the InputEncoder class in https://github.com/kensho-technologies/bubs/blob/master/bubs/helpers.py to add memory
implement a separate class outside of bubs.

If you go the former route, we are accepting pull requests:-)

Please refer to the beginning of Section 2 of the paper http://alanakbik.github.io/papers/naacl2019_embeddings.pdf. The embed() function they mention is equivalent to the example in https://github.com/kensho-technologies/bubs/blob/master/README.md - accepts text, outputs embeddings for each word. But the second part, the memory, is not implemented in bubs.

It requires an embed() function that produces a contextualized embedding for a given word in a sentence context (see Akbik et al. (2018)). It also requires a memory that records for each unique word all previous contextual embeddings, and a pool() operation to pool embedding vectors.

kensho-technologies / bubs

Pooled Contextual Embedding #16