Closed patdflynn closed 3 years ago
Hi,
By decoder only, do you mean an autoregressive encoder? Like for instance for image generation or text generation?
In that case any TransformerEncoder used with TriangularCausalMask would do.
Cheers, Angelos
For this project I'm comparing different models for time series forecasting but that's good to know. I figured a encoder with a linear layer at the end may be sufficient.
By the way this library is amazing! It's a bit of a learning curve though for someone new to transformers but I can see myself using it for the foreseeable future.
Thanks for your very good words! I am closing the issue as I assume that it is solved.
Feel free to reopen it or open another one in case you encounter problems.
Best, Angelos
I've reviewed the docs and I'm still a bit unclear how to build a decoder-only self attention transformer. Would you mind providing an example?