ymcui / Chinese-BERT-wwm

Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)
https://ieeexplore.ieee.org/document/9599397
Apache License 2.0
9.69k stars 1.39k forks source link

使用hfl/chinese-roberta-wwm-ext-large 微调masklm loss的问题 #120

Closed hengchao0248 closed 4 years ago

hengchao0248 commented 4 years ago

我在使用hfl/chinese-roberta-wwm-ext-large模型,在下游任务上微调mlm_loss的时候发现loss是300多,并且一直升高; 我用模型测试了几个mask句子任务,发现只有hfl/chinese-roberta-wwm-ext-large有问题,结果如下 image image image

我测试使用的是transformers里的TFBertForMaskedLM,具体代码如下:

def check_mlm_model(model_name, input_text):
    tokenizer_name = model_name
    tokenizer = BertTokenizer.from_pretrained(tokenizer_name)
    config_name = model_name
    config = RobertaConfig.from_pretrained(config_name)
    model_name = model_name
    model = TFBertForMaskedLM.from_pretrained(model_name, config=config, from_pt=True)

    text = input_text
    mask_input_ids = tokenizer.encode(text, add_special_tokens=True)
    mask_ind = mask_input_ids.index(103)
    logits = model(tf.convert_to_tensor(mask_input_ids)[None, :])[0]
    probs = tf.nn.softmax(logits, -1)
    per_probs = probs[0, mask_ind, :]
    pred = tf.argmax(per_probs)
    prob = per_probs[pred]

    pred_token = tokenizer.convert_ids_to_tokens([pred])[0]
    print(f"model_name: {model_name}, text:{text}, pred_token:{pred_token}, prob:{prob.numpy()}")

for input_text in ["今天[MASK]气不错", "我想[MASK]雪糕", "北京太热了,哈尔滨就不那么[MASK]"]:
    check_mlm_model("hfl/chinese-roberta-wwm-ext-large", input_text)
ymcui commented 4 years ago

先参考一下 #76 。另外,如果你对MLM部分进行了二次预训练,参数设置是怎样的?比如最重要的学习率和batch大小。

hengchao0248 commented 4 years ago

谢谢能这么快回复,我的学习率是1e-5, batch大小是64, max_seq_len是125,不过现在好了。mlm_loss出现上百的情况是因为我loss写错了

ymcui commented 4 years ago

好的,那本issue就关闭了。