Performance Evaluation - Githubissues

maxjcohen / transformer

Implementation of Transformer model (originally from Attention is All You Need) applied to Time Series.

GNU General Public License v3.0

843 stars 166 forks source link

Hi, yes Transformer currently performs slower, on this implementation, with this dataset.

LSTM and Transformer have different complexity, LSTM being in O(T, N^2), while Transformer is in O(T^2, N), where T is the time dimension and N the input vector dimension. In our dataset, T > N, which explains why LSTM is faster than Transformer.
LSTM class in pytorch are coded directly in CUDA if I'm not mistaken, whereas my Transformer is written in python, using pytorch of course. This added layer can slow things down. Additionally, I have made little efforts to optimize it. If you are looking for faster Transformer implementations, there is now a native one in pytorch.

maxjcohen / transformer