Open yyht opened 3 years ago
hi, there is no problem when I tried to train tta with the bert-base config (for English). did you try to train tta for English with the bert-base config, and get the same problem?
anyway, for a different language with a different vocabulary, you should modify one line in modeling.py. at line 161 in the file, 4 is for the dummy token id of [MASK]. so if you change vocabulary, you should match this number with your vocabulary id of "[MASK]" for dummy_ids.
lastly, in my experience, examining data (pre)processing again would be helpful.
if you have any further problems, please feel free to ask me again!
thanks :)
thanks for your help. I made some mistakes for hyparameters and it could run normally. Since you have done some experiments with bert-base config, i am wondering wheather tta could achieve better results on English data such as GLUE and sentence reranking on NMT and ASR ?
unfortunately, not yet tested on any specific tasks. due to the lack of computing resources in my lab, I had to use a much smaller batch size (less than 10 I think) for pre-training tta with bert-base config. so I tried, but not completed to train tta with that config.
hi, I am trying to do traing bert-base using tta for chinese, it got NAN with 1000-step optimization, I am wondering if you could give me some advice