idiap / fast-transformers

Pytorch library for fast transformer implementations
1.65k stars 179 forks source link

n_heads parameter of RecurrentTransformerEncoderLayer is not currently used #7

Closed TariqAHassan closed 4 years ago

TariqAHassan commented 4 years ago

Hello

The n_heads parameter of RecurrentTransformerEncoderLayer is not used. I don't think this is currently negatively impacting the builders because it looks to be correctly passed into RecurrentAttentionLayer.

Notes

See:

https://github.com/idiap/fast-transformers/blob/02552ccb45b5dc6488af0b3cf2db376ab58c1101/fast_transformers/recurrent/transformers.py#L42)

angeloskath commented 4 years ago

Hi,

Thanks for pointing it out. The same is true for the TransformerEncoderLayer. Both layers do not actually need the number of heads because they receive the attention module as a parameter so it should be removed from the constructor arguments.

As you mentioned, it is not currently affecting in any way since it is simply ignored.

Angelos

phongnhhn92 commented 4 years ago

Hi, Does this mean that you dont use Multi-head attention in your paper and this repo ?

angeloskath commented 4 years ago

@phongnhhn92 hi, no that is not the case just that the number of heads is defined elsewhere in the code.

phongnhhn92 commented 4 years ago

Thanks for your reply !

angeloskath commented 4 years ago

Fixed in 8acb570 .

Thanks for your patience. Also, unless you are using the builders the change is backwards incompatible.