Closed TariqAHassan closed 4 years ago
Hi,
Thanks for pointing it out. The same is true for the TransformerEncoderLayer. Both layers do not actually need the number of heads because they receive the attention module as a parameter so it should be removed from the constructor arguments.
As you mentioned, it is not currently affecting in any way since it is simply ignored.
Angelos
Hi, Does this mean that you dont use Multi-head attention in your paper and this repo ?
@phongnhhn92 hi, no that is not the case just that the number of heads is defined elsewhere in the code.
Thanks for your reply !
Fixed in 8acb570 .
Thanks for your patience. Also, unless you are using the builders the change is backwards incompatible.
Hello
The
n_heads
parameter ofRecurrentTransformerEncoderLayer
is not used. I don't think this is currently negatively impacting the builders because it looks to be correctly passed intoRecurrentAttentionLayer
.Notes
See:
https://github.com/idiap/fast-transformers/blob/02552ccb45b5dc6488af0b3cf2db376ab58c1101/fast_transformers/recurrent/transformers.py#L42)