lucidrains / routing-transformer

Fully featured implementation of Routing Transformer
MIT License
282 stars 29 forks source link

Compound words #31

Closed wingedsheep closed 2 years ago

wingedsheep commented 2 years ago

I was wondering if there is a good way to train the routing transformer (or x-transformers) on a 3d tensor input, like they do in the Compound Word Transformer. Instead of single tokens, token groups are fed into the model, and they are encoded into a single embedding.

I put together elements from x-transformers and compound word transformer to create a custom implementation. It works but it seems a bit messy.

Now I wanted to move this approach to the routing transformer, and was wondering what you think would be a good way to implement this cleanly?