Open vaibhavvarshney0 opened 3 years ago
Hi, sorry that I overlooked this. I just uploaded the missing files in the latest commit 14a5df3.
Thanks. Also For distil to conduct Word-KD do we have to train respective teacher model separately and provide it?
Yes, we need to train a single-task teacher model separately before we perform Word-KD.
It seems some code files are missing from this repo like train.sh, test.sh etc. or am I missing something?