Open VolantBoy opened 2 weeks ago
Hey @VolantBoy, the results are the same at 10e-5 precision in your example, which can be considered the same in most cases. Many things can affect precision: device difference, padding tokens, different kernels being used by pytorch, etc.
In this situation you and I also have different results; I don't have your model so I used bert-base-cased
, and here are the results I get:
from transformers import BertConfig, BertModel, BertTokenizer
import torch
import random
import numpy as np
random.seed(1234)
np.random.seed(1234)
torch.manual_seed(1234)
model_dir = "bert-base-cased"
config = BertConfig.from_pretrained(model_dir)
model = BertModel.from_pretrained(model_dir, config=config)
tokenizer:BertTokenizer = BertTokenizer.from_pretrained(model_dir)
model.eval()
def func(text_list):
batch = tokenizer(text_list, add_special_tokens=True, return_tensors="pt", padding=True, truncation=True)
outputs = model(**batch, return_dict=True)
print(text_list)
print(batch)
print(outputs.last_hidden_state.size())
for x in outputs.last_hidden_state[:, 0, -5:].tolist():
print(x)
print()
text_list_a = ["我可", "我可"]
func(text_list_a)
['我可', '我可']
{'input_ids': tensor([[101, 100, 100, 102],
[101, 100, 100, 102]]), 'token_type_ids': tensor([[0, 0, 0, 0],
[0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1],
[1, 1, 1, 1]])}
torch.Size([2, 4, 768])
[0.24575680494308472, -0.024106569588184357, 0.55533766746521, 0.5411606431007385, 0.256611168384552]
[0.24575680494308472, -0.024106569588184357, 0.55533766746521, 0.5411606431007385, 0.256611168384552]
@LysandreJik Thanks for your reply. The model I used is bert-base-chinese, and the versions of the related libraries are as follows torch=2.0.0 transformers=4.44.2 system = wsl2
I tried more cases, and it seems that this accuracy problem only occurs when bsz=2
Code execution results:
['我可']
{'input_ids': tensor([[ 101, 2769, 1377, 102]]),
'token_type_ids': tensor([[0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1]])}
torch.Size([1, 4, 768])
[-0.577276349067688, 0.12515364587306976, 0.5431490540504456, -0.2510761618614197, -0.02478857897222042]
['我可', '我可']
{'input_ids': tensor([[ 101, 2769, 1377, 102],
[ 101, 2769, 1377, 102]]), 'token_type_ids': tensor([[0, 0, 0, 0],
[0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1],
[1, 1, 1, 1]])}
torch.Size([2, 4, 768])
[-0.577276349067688, 0.12515364587306976, 0.5431490540504456, -0.2510761618614197, -0.02478857897222042]
[-0.5772761106491089, 0.1251537948846817, 0.5431487560272217, -0.2510768473148346, -0.02478843182325363]
['我可', '我可', '我可']
{'input_ids': tensor([[ 101, 2769, 1377, 102],
[ 101, 2769, 1377, 102],
[ 101, 2769, 1377, 102]]), 'token_type_ids': tensor([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]])}
torch.Size([3, 4, 768])
[-0.577276349067688, 0.12515364587306976, 0.5431490540504456, -0.2510761618614197, -0.02478857897222042]
[-0.577276349067688, 0.12515364587306976, 0.5431490540504456, -0.2510761618614197, -0.02478857897222042]
[-0.577276349067688, 0.12515364587306976, 0.5431490540504456, -0.2510761618614197, -0.02478857897222042]
['我可是', '我可']
{'input_ids': tensor([[ 101, 2769, 1377, 3221, 102],
[ 101, 2769, 1377, 102, 0]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1],
[1, 1, 1, 1, 0]])}
torch.Size([2, 5, 768])
[-0.9357996582984924, 0.17303651571273804, 0.3532883822917938, 0.3196558654308319, -0.4336218535900116]
[-0.577276349067688, 0.12515364587306976, 0.5431490540504456, -0.2510761618614197, -0.02478857897222042]
Once again I'm not entirely surprised and I think such a small difference won't have an impact in real world scenarios. See my first reply as to why this could happen!
Once again I'm not entirely surprised and I think such a small difference won't have an impact in real world scenarios. See my first reply as to why this could happen!
Yeah, I see. Thanks again for your reply. 😊
I tried to get the hidden_state of the same sentence at the CLS position, but found that they seemed to be different。I'm confused as to why this is。I also tried two versions of transformers, but the phenomenon is the same
transformers version: 3.3.0/4.44.2
Code execution results:
Complete code: