shibing624 / text2vec

text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。
https://pypi.org/project/text2vec/
Apache License 2.0
4.39k stars 392 forks source link

CoSENT损失计算问题 #135

Closed YingchaoX closed 10 months ago

YingchaoX commented 10 months ago

Describe the Question

Please provide a clear and concise description of what the question is.

老师您好,

想问在cosent_dataset.py的load_cosent_train_data()函数中,

if path.endswith('.jsonl'):
    data_list = load_jsonl(path)
    for entry in data_list:
        field1, field2 = get_field_names(entry)
        if not field1 or not field2:
            continue

        text_a, text_b, score = entry[field1], entry[field2], float(entry["label"])
        data.append((text_a, score))
        data.append((text_b, score))

为什么将三元组

text1, text2, label

拆分成了

text1, score
text2, score

这样的两对数据。那在计算训练损失的时候如何进行cos计算呢?

谢谢!

shibing624 commented 10 months ago

详见loss计算模块代码,奇偶取值计算

YingchaoX commented 10 months ago

详见loss计算模块代码,奇偶取值计算

哦哦哦,看到了看到了!