Closed sshleifer closed 4 years ago
I think I answered 1 and 2 https://github.com/marian-nmt/marian/blob/master/src/models/transformer.h#L770 Which suggests that the decoder has the same data flow as the encoder besides cross attention.
That is correct.
I am new to C++ and trying to port some of the trained translation models to python. I ran into a few questions: 1) Where in the code are the parameters like
"context_Wq"
used? 2) Where is the forward pass of the model whendecoder
is called? 3) Is the decoder of the seq2seq model also bert-like?