lucidrains / routing-transformer

Fully featured implementation of Routing Transformer
MIT License
282 stars 29 forks source link

One-hot encoded input? #7

Closed matthew-jurewicz closed 4 years ago

matthew-jurewicz commented 4 years ago

I'm looking through the code, and I'm not seeing the token IDs being converted to one-hot encoded vectors. Is the input to the language model with autoregressive wrapper the token IDs?

lucidrains commented 4 years ago

@matthew-jurewicz Hi Matthew! Yup, you just to pass the token ids and make sure you instantiate the language model with the num_tokens set to the maximum of the ids! The token embedding are fetched from the embedding table here https://github.com/lucidrains/routing-transformer/blob/master/routing_transformer/routing_transformer.py#L523

matthew-jurewicz commented 4 years ago

Excellent! I'm a big fan of your work!

lucidrains commented 4 years ago

This was an implementation of someone else's research https://openreview.net/forum?id=B1gjs6EtDr Hope you find it useful!

matthew-jurewicz commented 4 years ago

What I mean is, as far as I know, no one's written the code for a fully-functional sparse transformer, much less this.