Hi, thank you first for sharing all this code with us, it has been such a big help in my thesis. I have only one question.
I have been reading the code of the model "attention" many times but I couldn't put my finger on the decoder block in the Transformer architecture as mentioned in the original paper, or was that intentional?
Hi, thank you first for sharing all this code with us, it has been such a big help in my thesis. I have only one question. I have been reading the code of the model "attention" many times but I couldn't put my finger on the decoder block in the Transformer architecture as mentioned in the original paper, or was that intentional?