1、Is ZEN trained from any base bert(e.g. google) or trained from scratch? If from scrach, I guess the n-gram emb is randomly initialized, If from base bert, the n-gram emb maybe the average of characters included?
2、According to "We use the same parameter setting for the n-gram encoder as in BERT" in the paper,I want to know that the params of n-gram encoder is shared and the same with bert tower(maybe the bottom six layer?),or is initialized and trained independently?
There are two models in our paper. (R): randomly initialized parameters and (P): pre-trained model, which is the Google released Chinese BERT base model.
Sorry I don't quite get your question, could you elaborate on it? Thanks
Hi~
1、Is ZEN trained from any base bert(e.g. google) or trained from scratch? If from scrach, I guess the n-gram emb is randomly initialized, If from base bert, the n-gram emb maybe the average of characters included?
2、According to "We use the same parameter setting for the n-gram encoder as in BERT" in the paper,I want to know that the params of n-gram encoder is shared and the same with bert tower(maybe the bottom six layer?),or is initialized and trained independently?
thank you~