Hi,
Your attention mechanism is quite slow. Since you compute the linear projections (aw and bw) each time although they do not change, the time is almost quadratic.
I have implemented a faster version of attention which does a lot of precomputation and would like to push it as soon as I am done testing.
Hi, Your attention mechanism is quite slow. Since you compute the linear projections (aw and bw) each time although they do not change, the time is almost quadratic.
I have implemented a faster version of attention which does a lot of precomputation and would like to push it as soon as I am done testing.
Regards.