joongbo / tta

Repository for the paper "Fast and Accurate Deep Bidirectional Language Representations for Unsupervised Learning"
Apache License 2.0
108 stars 20 forks source link

when train tta with bert-base config and sequence length 512,got NAN #5

Open yyht opened 3 years ago

yyht commented 3 years ago

hi, I am trying to do traing bert-base using tta for chinese, it got NAN with 1000-step optimization, I am wondering if you could give me some advice

joongbo commented 3 years ago

hi, there is no problem when I tried to train tta with the bert-base config (for English). did you try to train tta for English with the bert-base config, and get the same problem?

anyway, for a different language with a different vocabulary, you should modify one line in modeling.py. at line 161 in the file, 4 is for the dummy token id of [MASK]. so if you change vocabulary, you should match this number with your vocabulary id of "[MASK]" for dummy_ids.

lastly, in my experience, examining data (pre)processing again would be helpful.

if you have any further problems, please feel free to ask me again!

thanks :)

yyht commented 3 years ago

thanks for your help. I made some mistakes for hyparameters and it could run normally. Since you have done some experiments with bert-base config, i am wondering wheather tta could achieve better results on English data such as GLUE and sentence reranking on NMT and ASR ?

joongbo commented 3 years ago

unfortunately, not yet tested on any specific tasks. due to the lack of computing resources in my lab, I had to use a much smaller batch size (less than 10 I think) for pre-training tta with bert-base config. so I tried, but not completed to train tta with that config.