netease-youdao / BCEmbedding

Netease Youdao's open-source embedding and reranker models for RAG products.
Apache License 2.0
1.3k stars 85 forks source link

AttributeError: 'SequenceClassifierOutput' object has no attribute 'last_hidden_state' #28

Closed Anooyman closed 5 months ago

Anooyman commented 5 months ago
from transformers import AutoModel, AutoTokenizer

# list of sentences
sentences = ['sentence_0', 'sentence_1']

# init model and tokenizer
tokenizer = AutoTokenizer.from_pretrained('maidalun1020/bce-embedding-base_v1')
model = AutoModel.from_pretrained('maidalun1020/bce-embedding-base_v1')

device = 'cpu'  # if no GPU, set "cpu"
model.to(device)

# get inputs
inputs = tokenizer(sentences, padding=True, truncation=True, max_length=512, return_tensors="pt")
inputs_on_device = {k: v.to(device) for k, v in inputs.items()}

# get embeddings
outputs = model(**inputs_on_device, return_dict=True)
embeddings = outputs.last_hidden_state[:, 0]  # cls pooler
embeddings = embeddings / embeddings.norm(dim=1, keepdim=True)  # normalize

我从hf上将模型下载到本地,运行 embedding 的时候,遇到 error 如下:

AttributeError: 'SequenceClassifierOutput' object has no attribute 'last_hidden_state'

请问应该如何解决,谢谢!

同时我看到有这些log,请问需要关注吗?

Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at bce-embedding-base_v1 and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
shenlei1020 commented 5 months ago

transformers版本是多少?

Anooyman commented 5 months ago

hi @shenlei1020 , transformers 的版本是 4.38.2

shenlei1020 commented 5 months ago

不超过4.37,建议用4.36

Anooyman commented 5 months ago
Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at bce-embedding-base_v1 and are newly initialized: ['classifier.out_proj.bias', 'classifier.dense.weight', 'classifier.dense.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[8], line 28
     26 # get embeddings
     27 outputs = embedding_model(**inputs_on_device, return_dict=True)
---> 28 embeddings = outputs.last_hidden_state[:, 0]  # cls pooler
     29 embeddings = embeddings / embeddings.norm(dim=1, keepdim=True)  # normalize

AttributeError: 'SequenceClassifierOutput' object has no attribute 'last_hidden_state'

还是同样的 error

Anooyman commented 5 months ago

hi @shenlei1020 , 可以帮忙看一下吗?已经更换了版本但是依然报错