Closed goodluck110706112 closed 3 years ago
Teacher forcing is used during training, why do you think it is not?
Thank you for your reply! Yes, teacher forcing is used during training. But I think the fairseq's teacher forcing ratio always equals to one, which means model always use ground truth as prev_output_tokens, can I set teacher forcing ratio as a hyperparameter? Such as teacher forcing ratio=0.5 or teacher forcing ratio = 0.7?
Did you already ask that question at the fairseq project directly? fairseq-image-captioning doesn't add anything special to make that possible.
I didn't ask this question at the fairseq before. Yes, this is a question about fairseq, not fairseq-image-captioning. I'm sorry for ask the question in a wrong place. But if we consider different teacher forcing ratio, we may have a better score in image-caption.
Yes, would be interesting too see. Let us know if you have any updates.
Thank you for your high quality code! Is there any method to achieve teacher forcing? Teacher forcing means in the decoding phase, we use the ground truth or prediction as prev_output_token according to probability. Thank you!