idiap / fast-transformers

Pytorch library for fast transformer implementations
1.65k stars 179 forks source link

allow distinct memory and decoder dimensionalities #59

Closed konstantinosKokos closed 3 years ago

konstantinosKokos commented 3 years ago

Current implementation assumes decoder dim and memory dim match, which is not a hard restriction but rather a design choice: there could be cases where the decoder operates on a smaller vocabulary than the encoder, requiring a significantly smaller embedding dimensionality. Would it be possible to separate the two?

angeloskath commented 3 years ago

Hi Kostantine,

Really sorry for the ridiculously late reply. I agree that there should be a simple way to add different parameters for the cross attention.

I assume that you already know that you can build the transformer manually (without using the builder) and in that case you can have arbitrary combinations of dimensions, attentions, etc.

I will tag this as an enhancement but I do think it is lower in priority than others (e.g. making binary releases so people don't have to wait for compilation).

Best, Angelos

angeloskath commented 3 years ago

This is finally implemented. It is not in the PyPI release yet but I will close this issue. Feel free to reopen it if needed.

An example use is as follows:

import torch
from fast_transformers.builders import TransformerDecoderBuilder

t = TransformerDecoderBuilder.from_kwargs(
    n_layers=4,
    n_heads=4,
    cross_n_heads=9,
    cross_model_dimensions=128,
    cross_value_dimensions=32
).get()

m = torch.randn(4, 55, 128)
x = torch.randn(4, 20, 4*64)
y = t(x, m)