Closed JosephPai closed 4 years ago
Hi, thanks for the question. The operation of "forward for target input" in pre-training can be thought of as a trick to improve the pre-training performance on the target domain. The insight is to roughly adapt the BN layers to the target domain, since the running mean&var in BN can take each input into the computation even there is no backpropagation. However, I found that whatever the performance of pre-training model is, MMT can always achieve competitive performance.
I see. Thanks for your reply!
Hi, Thanks for releasing the code. I've a question about pre-train stage. In PreTrainer, I cannot make sense why there is a forward for target input, since I did not find the usage of target_feature in training.
Thanks.