Closed mads-oestergaard closed 2 years ago
I guess you need to compare generation time using the recurrent model with the time to generate in an autoregressive way using the non-recurrent model (so 5431ms vs 2000 * 162ms).
Yeah okay. In that case the recurrent model becomes slightly faster than the non-recurrent model:
x_in = torch.randn(2, 2000, 128)
mask = TriangularCausalMask(1, device=x_in.device)
t0 = time.time()
for i in range(x_in.shape[1]):
with torch.no_grad():
x_mask = model(x_in[:, i, :].unsqueeze(1), attn_mask=mask)
elapsed = time.time() - t0
print("Elapsed:", elapsed*1000, "ms")
# >> Elapsed: 6432.35 ms
so 6.432s vs 5.431s, so the recurrent model is 13.5% faster (on my CPU). Is that speedup within what you would expect?
Hi,
I'm playing around with fast-transformers for use in the audio domain, and wanted to train a regular model with causal-linear attention and then evaluate it as a recurrent model. However, this quick inference speed test stroke me as odd:
The inference speed of the recurrent model is a lot slower than I would have expected from reading the paper. Am I using the library as intended?