Closed dmmiller612 closed 5 years ago
Thanks you @dmmiller612 Nice Work
@dmmiller612 Hello. I added 'Masked' Multi head attention using torch.triu
and edited your greedy decoder code in my new branch(Transformer)! Are you agree?
Please See my diff commit
1) 'Masked' Multi head attention in your greedy decoder
https://github.com/graykode/nlp-tutorial/commit/005d34bfa3cafb822599a526b85b732e2846213d
2) 'Masked' Multi head attention in Original Transformer https://github.com/graykode/nlp-tutorial/commit/5b4fb5ebca712f72e747674333dafc10182367e5
Thanks
In this PR, I added a greedy decoder function that generates the decoder input for inference. This is important for translating sentences as we don't know the target input beforehand. In the paper, they mentioned that they ran Beam Search with a k=4. In the greedy approach, k = 1.