计算两句子的相似度

yfq512 commented 2 years ago

'''

import torch from transformers import BertModel, BertTokenizer model_name = "hfl/chinese-roberta-wwm-ext-large" tokenizer = BertTokenizer.from_pretrained(model_name) model = BertModel.from_pretrained(model_name) input_text1 = "今天天气不错，你觉得呢？" input_text2 = "今天天气不错，你觉得呢？我喜欢吃饺子" input_ids1 = tokenizer.encode(input_text1, add_special_tokens=True) input_ids2 = tokenizer.encode(input_text2, add_special_tokens=True) input_ids1 = torch.tensor([input_ids1]) input_ids2 = torch.tensor([input_ids2]) out1 = model(input_ids1)[0] out2 = model(input_ids2)[0] out1.shape torch.Size([1, 14, 1024]) out2.shape torch.Size([1, 20, 1024]) ''' 为什么输出特征维度不一样，我想比较两个句子的相似度，用哪个维度的特征呢？

yfq512 commented 2 years ago

I have soloved it

pharrellyhy commented 2 years ago

@yfq512 Hi, in terms of sentence similarity, does it work better than tf-idf?

ymcui / Chinese-BERT-wwm

计算两句子的相似度 #205