Closed ahmad-PH closed 3 years ago
Thanks for this suggestions, this is indeed a bit more clear. However, from my quick experiments it seems that this implementation requires a bit more memory and is slightly slower. Therefore I did not merge your request, but I have added a comment with your implementation for clarity. Thanks!
No problem :) I'm glad it was useful. I hadn't thought of checking speed and memory myself (sorry!), so it's a good thing you did :). I will take that into account in future suggestions.
It should be easier to understand now. I struggled understanding this part specifically because the matrix multiplication and summation from article are compressed into a single matrix multiplication. Now they are two separate steps. I also tested the new MultiHeadAttention class with the following snippet to make sure nothing functional changed:
Which prints True every time. (
MultiHeadAttentionRefactored
is the name I gave to the modified class)