Open sshleifer opened 4 years ago
Q1: The embedding of the sentence-start (BOS
or <s>
) context is hard-coded to be 0. It is not copied from the embedding matrix. I always felt that's a bug, but anecdotally, it makes no accuracy difference.
Q2: Each beam hypothesis that ends in EOS
(or </s>
) will cease to be expanded. Once all hyps for a sentence end in EOS
, sentence translation is complete.
I was inspecting intermediate values of the output tensor
transformer.h
, while runningmarian_decoder
, and noticed that the first step through the decoder some sort of token is passed that has 0 word embedding. Q1) What token is used as a prefix? Are there tricks to make it's embedding 0?Q2) How does the decoder know to terminate a translation? In my python port of the opus-nmt models, the decoder never predicts ''.
Additional Clues
My python port of the opus-nmt models works nicely when english is the source language, and just generates a dummy token when it is done translating. For fr-en, it generates nonsense at the beginning of the generation, whereas
marian-decoder
generates no nonsense at all :)Thanks in Advance!