the problem of precision

VolantBoy commented 2 weeks ago

I tried to get the hidden_state of the same sentence at the CLS position, but found that they seemed to be different。I'm confused as to why this is。I also tried two versions of transformers, but the phenomenon is the same

transformers version: 3.3.0/4.44.2

Code execution results：

['我可', '我可']
{'input_ids': tensor([[ 101, 2769, 1377,  102],
        [ 101, 2769, 1377,  102]]), 'token_type_ids': tensor([[0, 0, 0, 0],
        [0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1],
        [1, 1, 1, 1]])}
torch.Size([2, 4, 768])
[-0.5772758722305298, 0.12515394389629364, 0.5431485772132874, -0.25107723474502563, -0.024788254871964455]
[-0.577276349067688, 0.12515370547771454, 0.5431481599807739, -0.2510775625705719, -0.024788187816739082]

Complete code：

from transformers import BertConfig, BertModel, BertTokenizer
import torch
import random
import numpy as np

random.seed(1234)
np.random.seed(1234)
torch.manual_seed(1234)

model_dir = "pretrained_model/bert"
config = BertConfig.from_pretrained(model_dir)
model = BertModel.from_pretrained(model_dir, config=config)
tokenizer:BertTokenizer = BertTokenizer.from_pretrained(model_dir)
model.eval()

def func(text_list):
    batch = tokenizer(text_list, add_special_tokens=True, return_tensors="pt", padding=True, truncation=True)
    outputs = model(**batch, return_dict=True)
    print(text_list)
    print(batch)
    print(outputs.last_hidden_state.size())
    for x in outputs.last_hidden_state[:, 0, -5:].tolist():
        print(x)
    print()

text_list_a = ["我可", "我可"]

func(text_list_a)

LysandreJik commented 2 weeks ago

Hey @VolantBoy, the results are the same at 10e-5 precision in your example, which can be considered the same in most cases. Many things can affect precision: device difference, padding tokens, different kernels being used by pytorch, etc.

In this situation you and I also have different results; I don't have your model so I used bert-base-cased, and here are the results I get:

from transformers import BertConfig, BertModel, BertTokenizer
import torch
import random
import numpy as np

random.seed(1234)
np.random.seed(1234)
torch.manual_seed(1234)

model_dir = "bert-base-cased"
config = BertConfig.from_pretrained(model_dir)
model = BertModel.from_pretrained(model_dir, config=config)
tokenizer:BertTokenizer = BertTokenizer.from_pretrained(model_dir)
model.eval()

def func(text_list):
    batch = tokenizer(text_list, add_special_tokens=True, return_tensors="pt", padding=True, truncation=True)
    outputs = model(**batch, return_dict=True)
    print(text_list)
    print(batch)
    print(outputs.last_hidden_state.size())
    for x in outputs.last_hidden_state[:, 0, -5:].tolist():
        print(x)
    print()

text_list_a = ["我可", "我可"]

func(text_list_a)

['我可', '我可']
{'input_ids': tensor([[101, 100, 100, 102],
        [101, 100, 100, 102]]), 'token_type_ids': tensor([[0, 0, 0, 0],
        [0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1],
        [1, 1, 1, 1]])}
torch.Size([2, 4, 768])
[0.24575680494308472, -0.024106569588184357, 0.55533766746521, 0.5411606431007385, 0.256611168384552]
[0.24575680494308472, -0.024106569588184357, 0.55533766746521, 0.5411606431007385, 0.256611168384552]

VolantBoy commented 2 weeks ago

@LysandreJik Thanks for your reply. The model I used is bert-base-chinese, and the versions of the related libraries are as follows torch=2.0.0 transformers=4.44.2 system = wsl2

I tried more cases, and it seems that this accuracy problem only occurs when bsz=2

Code execution results：

['我可']
{'input_ids': tensor([[ 101, 2769, 1377,  102]]), 
'token_type_ids': tensor([[0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1]])}
torch.Size([1, 4, 768])
[-0.577276349067688, 0.12515364587306976, 0.5431490540504456, -0.2510761618614197, -0.02478857897222042]

['我可', '我可']
{'input_ids': tensor([[ 101, 2769, 1377,  102],
        [ 101, 2769, 1377,  102]]), 'token_type_ids': tensor([[0, 0, 0, 0],
        [0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1],
        [1, 1, 1, 1]])}
torch.Size([2, 4, 768])
[-0.577276349067688, 0.12515364587306976, 0.5431490540504456, -0.2510761618614197, -0.02478857897222042]
[-0.5772761106491089, 0.1251537948846817, 0.5431487560272217, -0.2510768473148346, -0.02478843182325363]

['我可', '我可', '我可']
{'input_ids': tensor([[ 101, 2769, 1377,  102],
        [ 101, 2769, 1377,  102],
        [ 101, 2769, 1377,  102]]), 'token_type_ids': tensor([[0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]])}
torch.Size([3, 4, 768])
[-0.577276349067688, 0.12515364587306976, 0.5431490540504456, -0.2510761618614197, -0.02478857897222042]
[-0.577276349067688, 0.12515364587306976, 0.5431490540504456, -0.2510761618614197, -0.02478857897222042]
[-0.577276349067688, 0.12515364587306976, 0.5431490540504456, -0.2510761618614197, -0.02478857897222042]

['我可是', '我可']
{'input_ids': tensor([[ 101, 2769, 1377, 3221,  102],
        [ 101, 2769, 1377,  102,    0]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1],
        [1, 1, 1, 1, 0]])}
torch.Size([2, 5, 768])
[-0.9357996582984924, 0.17303651571273804, 0.3532883822917938, 0.3196558654308319, -0.4336218535900116]
[-0.577276349067688, 0.12515364587306976, 0.5431490540504456, -0.2510761618614197, -0.02478857897222042]

LysandreJik commented 2 weeks ago

Once again I'm not entirely surprised and I think such a small difference won't have an impact in real world scenarios. See my first reply as to why this could happen!

VolantBoy commented 2 weeks ago

Once again I'm not entirely surprised and I think such a small difference won't have an impact in real world scenarios. See my first reply as to why this could happen!

Yeah, I see. Thanks again for your reply. 😊

huggingface / transformers

the problem of precision #33397