Closed konstantinosKokos closed 3 years ago
Hi Kostantine,
Really sorry for the ridiculously late reply. I agree that there should be a simple way to add different parameters for the cross attention.
I assume that you already know that you can build the transformer manually (without using the builder) and in that case you can have arbitrary combinations of dimensions, attentions, etc.
I will tag this as an enhancement but I do think it is lower in priority than others (e.g. making binary releases so people don't have to wait for compilation).
Best, Angelos
This is finally implemented. It is not in the PyPI release yet but I will close this issue. Feel free to reopen it if needed.
An example use is as follows:
import torch
from fast_transformers.builders import TransformerDecoderBuilder
t = TransformerDecoderBuilder.from_kwargs(
n_layers=4,
n_heads=4,
cross_n_heads=9,
cross_model_dimensions=128,
cross_value_dimensions=32
).get()
m = torch.randn(4, 55, 128)
x = torch.randn(4, 20, 4*64)
y = t(x, m)
Current implementation assumes decoder dim and memory dim match, which is not a hard restriction but rather a design choice: there could be cases where the decoder operates on a smaller vocabulary than the encoder, requiring a significantly smaller embedding dimensionality. Would it be possible to separate the two?