idiap / fast-transformers

Pytorch library for fast transformer implementations
1.65k stars 179 forks source link

Encoder-decoder setup? #3

Closed ghost closed 4 years ago

ghost commented 4 years ago

Thanks for all the work!

Is there anyway to use this library for a task that would typically require an encoder-decoder architecture, like machine translation?

I see the BERT example in the docs, but no mention of a decoder anywhere.

Thanks again :)

angeloskath commented 4 years ago

Hi,

Thanks for liking our work!

We will add a transformer decoder in the coming weeks.

The attention accepts different sources for queries and keys in case you want to try something preliminary. A simple transformer decoder should not be that hard to implement. However, since we are still at version 0.1 many things could change to streamline the creation and use of various transformers. For instance maybe the builders require some refactoring when adding also decoders.

Thanks again for your patience!

Angelos

P.S.: I will keep the issue open until we add them

ghost commented 4 years ago

@angeloskath

Cool, thanks for the reply. I am also curious about using the transformer as an RNN for discrete sampling, for example top-p sampling during language generation.

I apologize if this is a dumb question :)

angeloskath commented 4 years ago

Hm this is actually a cool idea. Because the memory requirements are significantly smaller, you can run beam-search much easier. For instance thousands of different paths instead of 1 or 10.

ghost commented 4 years ago

Hm this is actually a cool idea. Because the memory requirements are significantly smaller, you can run beam-search much easier. For instance thousands of different paths instead of 1 or 10.

That was my hopes as well. I'll try to experiment. Please let me know if you have some suggested starting points.

hadaev8 commented 4 years ago

@anthonyfuller7 Have you any progress with this? I wonna try encoder-decoder, but not sure where to start.

ghost commented 4 years ago

@anthonyfuller7 Have you any progress with this? I wonna try encoder-decoder, but not sure where to start.

Yes, I am using an encoder-decoder setup. I just edited the transformers.py file such that I can pass in a tensor 'q' along with the tensor 'x'. In this case, my 'q' tensor are the queries, and 'x' are keys and values. Something like this:

self.attention(q, x, x)

I believe I had to add a query mask as well, but I don't remember off the top of my head.

AndriyMulyar commented 4 years ago

@anthonyfuller7 Any link to a public decoder implementation you can share?

hadaev8 commented 4 years ago

I tired to build simple encoder decoder, but something is wrong with masking I would be grateful for any hint https://colab.research.google.com/drive/10zY6dX2iDm1Mz4vOk6bSzN8CYhF5Ev2l#scrollTo=cflC2xVxKb5M&line=18&uniqifier=1

AndriyMulyar commented 4 years ago

I tired to build simple encoder decoder, but something is wrong with masking I would be grateful for any hint https://colab.research.google.com/drive/10zY6dX2iDm1Mz4vOk6bSzN8CYhF5Ev2l#scrollTo=cflC2xVxKb5M&line=18&uniqifier=1

Your decoder implementation is wrong. You never perform self attention on decoders input (only cross attention on the encoder output).

hadaev8 commented 4 years ago

@AndriyMulyar Should you point more specifically?

Here i pass target and encoder output self.transformer_decoder(trg, src, attn_mask=decoder_mask, query_lengths=decoder_len_mask, key_lengths=encoder_len_mask)

in TransformerDecoder i pass x and kv to decder layer, and in decoder layer i pass x and kv to attention

self.attention(
            x, kv, kv,
            attn_mask=attn_mask,
            query_lengths=query_lengths,
            key_lengths=key_lengths
        )

Should be fine, isnt it?

AndriyMulyar commented 4 years ago

@AndriyMulyar Should you point more specifically?

Here i pass target and encoder output self.transformer_decoder(trg, src, attn_mask=decoder_mask, query_lengths=decoder_len_mask, key_lengths=encoder_len_mask)

in TransformerDecoder i pass x and kv to decder layer, and in decoder layer i pass x and kv to attention

self.attention(
            x, kv, kv,
            attn_mask=attn_mask,
            query_lengths=query_lengths,
            key_lengths=key_lengths
        )

Should be fine, isnt it?

A Transformer decoder layer has 2 attention layers - one is self attention and the other cross attention on encoder. Your implementation only implements the cross attention component. You need to include self attention on the decoder inputs. Look at the TransformerDecoder layer here for an example: https://pytorch.org/docs/master/_modules/torch/nn/modules/transformer.html#Transformer

hadaev8 commented 4 years ago

@AndriyMulyar You are right. Thanks. Now I get the error in interference about lengths masking, while im not using lengths masking. https://colab.research.google.com/drive/10zY6dX2iDm1Mz4vOk6bSzN8CYhF5Ev2l?authuser=1#scrollTo=80zstJEcG3kU&line=1&uniqifier=1

angeloskath commented 4 years ago

Hey guys, sorry for being quite late at developing this (a bit of vacation was involved).

I have started the implementation in the branch 'transformer-decoder'. The TransformerDecoderLayer and TransformerDecoder are ready for use in training (although not heavily tested yet). I am in the process of creating a RecurrentTransformerDecoder which can be used for efficient inference.

After the above I need to edit the builders and the docs and we should be done.

Thanks for being patient with me.

Cheers, Angelos

hadaev8 commented 4 years ago

@angeloskath Will you add the ability to plot encoder-decoder attention weights? My toy model seems to work, but another one for the TTS task does not synthesis anything meaningful. I think it is because decoder does not use encoder output, but I can't figure out how to plot attention weights in this lib.

angeloskath commented 4 years ago

Hi all,

I have pushed the decoder and the new builders for building transformer decoders both recurrent and non-recurrent. The code is not super tested and I have done a complete overhaul of the builders so some stuff may be backwards incompatible (or in the coming days where I check all the parameter names and such).

Documentation and further tests are still due (as is the PyPI package).

I will be keeping the issue open until the docs are done and the decoders are included in a release.

Cheers, Angelos

AndriyMulyar commented 4 years ago

Thank you for the quick turn around time in getting this merged!

On Mon, Aug 24, 2020, 12:36 PM Angelos Katharopoulos < notifications@github.com> wrote:

Hi all,

I have pushed the decoder and the new builders for building transformer decoders both recurrent and non-recurrent. The code is not super tested and I have done a complete overhaul of the builders so some stuff may be backwards incompatible (or in the coming days where I check all the parameter names and such).

Documentation and further tests are still due (as is the PyPI package).

I will be keeping the issue open until the docs are done and the decoders are included in a release.

Cheers, Angelos

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/idiap/fast-transformers/issues/3#issuecomment-679236591, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJ4TBQVFAHN7GO2RLLPKW3SCKJIFANCNFSM4OPDGOVQ .

angeloskath commented 4 years ago

The decoders are now available and the docs are updated so I will close this issue.

I will focus next on issue #20 but some examples of machine translation should also be coming in the near future.

Thanks everybody for your patience.

Angelos