Closed taeho-kil closed 3 years ago
@xellows1305
Thank you for your interests in our project and sorry for the late response.
When we pre-train the model, ht_pretrain.json
is used as the config file.
The vocab size change comes from https://github.com/linjieli222/HERO/blob/f938515424b5f3249fc1d2e7f0373f64112a6529/model/model.py#L363
In pad_vocab()
, we pad the word embeddings to be multiple of 8 to fully utilize the tensor cores in our GPUs.
You can also refer to this function, where the padding is implemented at: https://github.com/linjieli222/HERO/blob/f938515424b5f3249fc1d2e7f0373f64112a6529/model/modeling_utils.py#L124-L135
Thanks.
Closed due to inactivity.
In pre-train configuration file "hero_pretrain.json", the vocab size of f_config is 50,265 (it may be from RoBERTa model).
However, The pre-trained model 'hero-tv-ht100.pt" has the vocab size of f_config as 50,272 (I check the dimension of the model.v_encoder.f_encoder.lm_head.decoder)
When the 'hero-tv-ht100.pt' model is trained, which configuration file is used?