Closed DanielAtKrypton closed 3 years ago
Hi, yes Transformer currently performs slower, on this implementation, with this dataset.
O(T, N^2)
, while Transformer is in O(T^2, N)
, where T
is the time dimension and N
the input vector dimension. In our dataset, T
> N
, which explains why LSTM is faster than Transformer.
I think it is worthwhile to evaluate the performance of the transformer. I have an indication that it is performing slower when compared to LSTM as of now.
This post covers Python, performance, and GPUs. It lays out the current status, and describes future work.
It might be worth to evaluate performance boost with these techniques.