Closed nicolabertoldi closed 5 years ago
@teslacool
I looked at the code deeply, and I think I was wrong in my second and third claims. In practice, during the inference the source and target lm decoder are not used at all, but instead a standard transformer architecture is exploited.
Is my new claim about the inference correct?
yes, during inference the src_tokens_lm=None
and prev_output_tokens_lm=None
. So this is the same as standard architecture.
@teslacool
thanks
I need few clarifications.
Please confirm and/or comment about the following claims related to your software:
during training of the transformer_lmnmt architecture, the parameters related to the source and target lm decoders (i.e the lowers layers of the entire architecture) are not trained
during inference with transformer_lmnmt architecture, the source and target lm decoders are active in the sense that the input tokens go through these layers before traversing the transformer encoder and decoder
the forward step of inference is essentially the same as the forward step of training
If any of the previous is wrong, please explain me the right process.
If I am totally right, I have a further question. Have you ever tried to infer the translation without the source and the target lm layers, i.e. using a standard transformer? Which results did you get? If you did not try, which is your feeling about such experiment?