RoBERTa-wwm-ext-large应用到全新领域不收敛

ymcui / Chinese-BERT-wwm

Pre-Training with Whole Word Masking for Chinese BERT（中文BERT-wwm系列模型）

https://ieeexplore.ieee.org/document/9599397

Apache License 2.0

9.56k stars 1.38k forks source link

RoBERTa-wwm-ext-large应用到全新领域不收敛 #227

Closed JerryYao80 closed 1 year ago

JerryYao80 commented 1 year ago

当前我在以您的RoBERTa-wwm-ext-large为基础PTM，意图通过Fine-tuning的方式将其适用于某一垂直领域，当前模型不收敛，所以有下面的问题想请教： 1 分词粒度问题：基于该领域的用词特点，表示一些完整含义的词要8个字以上，这样后面做NER的时候，可能会界限更加重合，不知道较长的分词，会不会影响到收敛 2 1.2G的训练语料，适用的学习率推荐是多少？我当前选择的是1e-3 3 batch_size当前是64，会不会太小了，影响到收敛速度

谢谢指导

ymcui commented 1 year ago

1、输入文本不需要经过中文分词，直接使用bert原始的tokenizer处理。 2、学习率太大了，一般都是在1e-5的量级周围。 3、finetune用这个batch size没什么问题。

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 1 year ago

Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.

yyxx1997 commented 4 months ago

我也遇到了类似的问题，将这里提到的roberta chinese进行分类任务微调，使用cls+pooler+mlp的方式不改动模型本身，发现损失基本不降，准确率不升，lr5e-5，bs128 按理说lr已经跟1e-5一个数量级了啊