Closed ghost closed 4 years ago
Hi,
Thanks for liking our work!
We will add a transformer decoder in the coming weeks.
The attention accepts different sources for queries and keys in case you want to try something preliminary. A simple transformer decoder should not be that hard to implement. However, since we are still at version 0.1 many things could change to streamline the creation and use of various transformers. For instance maybe the builders require some refactoring when adding also decoders.
Thanks again for your patience!
Angelos
P.S.: I will keep the issue open until we add them
@angeloskath
Cool, thanks for the reply. I am also curious about using the transformer as an RNN for discrete sampling, for example top-p sampling during language generation.
I apologize if this is a dumb question :)
Hm this is actually a cool idea. Because the memory requirements are significantly smaller, you can run beam-search much easier. For instance thousands of different paths instead of 1 or 10.
Hm this is actually a cool idea. Because the memory requirements are significantly smaller, you can run beam-search much easier. For instance thousands of different paths instead of 1 or 10.
That was my hopes as well. I'll try to experiment. Please let me know if you have some suggested starting points.
@anthonyfuller7 Have you any progress with this? I wonna try encoder-decoder, but not sure where to start.
@anthonyfuller7 Have you any progress with this? I wonna try encoder-decoder, but not sure where to start.
Yes, I am using an encoder-decoder setup. I just edited the transformers.py file such that I can pass in a tensor 'q' along with the tensor 'x'. In this case, my 'q' tensor are the queries, and 'x' are keys and values. Something like this:
self.attention(q, x, x)
I believe I had to add a query mask as well, but I don't remember off the top of my head.
@anthonyfuller7 Any link to a public decoder implementation you can share?
I tired to build simple encoder decoder, but something is wrong with masking I would be grateful for any hint https://colab.research.google.com/drive/10zY6dX2iDm1Mz4vOk6bSzN8CYhF5Ev2l#scrollTo=cflC2xVxKb5M&line=18&uniqifier=1
I tired to build simple encoder decoder, but something is wrong with masking I would be grateful for any hint https://colab.research.google.com/drive/10zY6dX2iDm1Mz4vOk6bSzN8CYhF5Ev2l#scrollTo=cflC2xVxKb5M&line=18&uniqifier=1
Your decoder implementation is wrong. You never perform self attention on decoders input (only cross attention on the encoder output).
@AndriyMulyar Should you point more specifically?
Here i pass target and encoder output
self.transformer_decoder(trg, src, attn_mask=decoder_mask, query_lengths=decoder_len_mask, key_lengths=encoder_len_mask)
in TransformerDecoder i pass x and kv to decder layer, and in decoder layer i pass x and kv to attention
self.attention(
x, kv, kv,
attn_mask=attn_mask,
query_lengths=query_lengths,
key_lengths=key_lengths
)
Should be fine, isnt it?
@AndriyMulyar Should you point more specifically?
Here i pass target and encoder output
self.transformer_decoder(trg, src, attn_mask=decoder_mask, query_lengths=decoder_len_mask, key_lengths=encoder_len_mask)
in TransformerDecoder i pass x and kv to decder layer, and in decoder layer i pass x and kv to attention
self.attention( x, kv, kv, attn_mask=attn_mask, query_lengths=query_lengths, key_lengths=key_lengths )
Should be fine, isnt it?
A Transformer decoder layer has 2 attention layers - one is self attention and the other cross attention on encoder. Your implementation only implements the cross attention component. You need to include self attention on the decoder inputs. Look at the TransformerDecoder layer here for an example: https://pytorch.org/docs/master/_modules/torch/nn/modules/transformer.html#Transformer
@AndriyMulyar You are right. Thanks. Now I get the error in interference about lengths masking, while im not using lengths masking. https://colab.research.google.com/drive/10zY6dX2iDm1Mz4vOk6bSzN8CYhF5Ev2l?authuser=1#scrollTo=80zstJEcG3kU&line=1&uniqifier=1
Hey guys, sorry for being quite late at developing this (a bit of vacation was involved).
I have started the implementation in the branch 'transformer-decoder'. The TransformerDecoderLayer
and TransformerDecoder
are ready for use in training (although not heavily tested yet). I am in the process of creating a RecurrentTransformerDecoder
which can be used for efficient inference.
After the above I need to edit the builders and the docs and we should be done.
Thanks for being patient with me.
Cheers, Angelos
@angeloskath Will you add the ability to plot encoder-decoder attention weights? My toy model seems to work, but another one for the TTS task does not synthesis anything meaningful. I think it is because decoder does not use encoder output, but I can't figure out how to plot attention weights in this lib.
Hi all,
I have pushed the decoder and the new builders for building transformer decoders both recurrent and non-recurrent. The code is not super tested and I have done a complete overhaul of the builders so some stuff may be backwards incompatible (or in the coming days where I check all the parameter names and such).
Documentation and further tests are still due (as is the PyPI package).
I will be keeping the issue open until the docs are done and the decoders are included in a release.
Cheers, Angelos
Thank you for the quick turn around time in getting this merged!
On Mon, Aug 24, 2020, 12:36 PM Angelos Katharopoulos < notifications@github.com> wrote:
Hi all,
I have pushed the decoder and the new builders for building transformer decoders both recurrent and non-recurrent. The code is not super tested and I have done a complete overhaul of the builders so some stuff may be backwards incompatible (or in the coming days where I check all the parameter names and such).
Documentation and further tests are still due (as is the PyPI package).
I will be keeping the issue open until the docs are done and the decoders are included in a release.
Cheers, Angelos
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/idiap/fast-transformers/issues/3#issuecomment-679236591, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJ4TBQVFAHN7GO2RLLPKW3SCKJIFANCNFSM4OPDGOVQ .
The decoders are now available and the docs are updated so I will close this issue.
I will focus next on issue #20 but some examples of machine translation should also be coming in the near future.
Thanks everybody for your patience.
Angelos
Thanks for all the work!
Is there anyway to use this library for a task that would typically require an encoder-decoder architecture, like machine translation?
I see the BERT example in the docs, but no mention of a decoder anywhere.
Thanks again :)