ymcui / Chinese-BERT-wwm

Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)
https://ieeexplore.ieee.org/document/9599397
Apache License 2.0
9.56k stars 1.38k forks source link

计算两句子的相似度 #205

Closed yfq512 closed 2 years ago

yfq512 commented 2 years ago

'''

import torch from transformers import BertModel, BertTokenizer model_name = "hfl/chinese-roberta-wwm-ext-large" tokenizer = BertTokenizer.from_pretrained(model_name) model = BertModel.from_pretrained(model_name) input_text1 = "今天天气不错,你觉得呢?" input_text2 = "今天天气不错,你觉得呢?我喜欢吃饺子" input_ids1 = tokenizer.encode(input_text1, add_special_tokens=True) input_ids2 = tokenizer.encode(input_text2, add_special_tokens=True) input_ids1 = torch.tensor([input_ids1]) input_ids2 = torch.tensor([input_ids2]) out1 = model(input_ids1)[0] out2 = model(input_ids2)[0] out1.shape torch.Size([1, 14, 1024]) out2.shape torch.Size([1, 20, 1024]) ''' 为什么输出特征维度不一样,我想比较两个句子的相似度,用哪个维度的特征呢?

yfq512 commented 2 years ago

I have soloved it

pharrellyhy commented 2 years ago

@yfq512 Hi, in terms of sentence similarity, does it work better than tf-idf?