您好，在使用training_sup_text_matching_model_en.py进行微调的时候遇到的一些问题

shibing624 / text2vec

text2vec, text to vector. 文本向量表征工具，把文本转化为向量矩阵，实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型，开箱即用。

https://pypi.org/project/text2vec/

Apache License 2.0

4.39k stars 392 forks source link

您好，在使用training_sup_text_matching_model_en.py进行微调的时候遇到的一些问题 #106

Closed Fino2020 closed 1 year ago

Fino2020 commented 1 year ago

Describe the Question

Please provide a clear and concise description of what the question is. 我尝试去使用我自己的数据集，标签只有0和1，我是打算做二分类问题，请问最后的指标是应该怎么算呢，大于0.5的算1，小于0.5的算0吗？因为在您的库中我没有找到与Accuracy或者precision这些指标相关的代码。

shibing624 commented 1 year ago

可以，大于0.5的算1，小于0.5的算0。

Fino2020 commented 1 year ago

您好，很抱歉打扰您，还有一个问题，在选择bert-match进行训练的时候，chuxianl出现了如下图所示的问题。源代码在bertmatching_model.py的第266行中对batch进行分割，但是我输出batch后，batch变量s是一个已经经过bert表征后的结果，请问这是咋回事？训练参数如下 --model_arch bert --do_train --do_predict --num_epochs 10 --model_name bert-base-uncased --output_dir ./outputs/STS-B-en-cosent --stsb_file data/CVE_CWE/CVE_CWE.tsv.gz

shibing624 commented 1 year ago

这是tokenizer的结果。

没懂你的问题，是训练代码出问题了吗？

Fino2020 commented 1 year ago

好像是的，在使用bert matching的时候代码好像有问题。

发自我的iPhone

------------------ 原始邮件 ------------------ 发件人: zxx飞翔的鱼 @.> 发送时间: 2023年7月26日 22:19 收件人: shibing624/text2vec @.> 抄送: Fino-QQ2361784228 @.>, Author @.> 主题: Re: [shibing624/text2vec] 您好，在使用training_sup_text_matching_model_en.py进行微调的时候遇到的一些问题 (Issue #106)

这是tokenizer的结果。

没懂你的问题，是训练代码出问题了吗？

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Fino2020 commented 1 year ago

您好，我按照SentenceBert的代码进行了修改， inputs, labels = batch 更改为 inputs, target, labels = batch。应该是tokenization之后少赋值了一个变量，但是在evaluation的时候好像还是不大对劲，报的错是损失函数的输入标签是Double类型不支持。但是在训练的时候同样的代码就没问题。在bertmatching_model.py的第350行

shibing624 commented 1 year ago

不清楚你怎么修改的