shibing624 / text2vec

text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。
https://pypi.org/project/text2vec/
Apache License 2.0
4.39k stars 392 forks source link

Loss稳定在较高数值不再下降,无法复现效果 #113

Closed Mike4Ellis closed 1 year ago

Mike4Ellis commented 1 year ago

我这边尝试用LERT预训练模型,在STS-B上进行微调,发现Loss稳定在2.8左右迟迟不下降,且其他两个验证参数效果也很差; 同时我也在领域数据上进行训练,同样遇到Loss不下降的问题,请问可能是什么原因导致的呢? 我看到有类似的提问,是因为设置了数据集的Shuffle=True,而我并没有这么做,并未改动源代码。

LERT在STS-B上的训练结果: global_step,train_loss,eval_spearman,eval_pearson 654,2.909956932067871,0.15062362655684902,0.15703048127442665 1308,2.898895025253296,-0.00733650298229431,-0.0017661651283225836 1962,2.882506847381592,0.015743396329087105,0.020234797826883942 2616,2.90830397605896,0.04478117032881518,0.04025315257329805 3270,2.873159170150757,0.03424968506503327,0.042355918738100566 3924,2.894749879837036,-0.018188861377460227,-0.026906756123500686 4578,2.891392469406128,-0.005837147933115063,-0.009507231708653224 5232,2.888775110244751,0.15088458930722992,0.07794487547437834 5886,2.8585562705993652,0.1578157941854556,0.09264837556760602 6540,2.8533475399017334,0.10395236860076923,0.09649726316964022

Mike4Ellis commented 1 year ago

问题多方尝试未能解决,非常希望大佬可以给出一些见解或猜测,谢谢!

shibing624 commented 1 year ago

我没用LERT,用的macbert

Mike4Ellis commented 1 year ago

macBERT也是一样的,和模型的关系不大。