huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.05k stars 26.3k forks source link

Batch elements interfere with each other with int8 #22269

Closed leonweber closed 1 year ago

leonweber commented 1 year ago

System Info

Who can help?

@sgugger @muell

Information

Tasks

Reproduction

The outputs of a model for a given batch element depend on the other elements in the batch when using int8 inference. See minimal example below. I'm not sure whether this is expected?

import transformers

model = transformers.AutoModelForCausalLM.from_pretrained("bigscience/bloom-560m", load_in_8bit=True, device_map="auto")
tokenizer = transformers.AutoTokenizer.from_pretrained("bigscience/bloom-560m")
out1 = model(**tokenizer(["A"], return_tensors="pt").to("cuda"))
out2 = model(**tokenizer(["A"], ["B"], return_tensors="pt").to("cuda"))
print(out1['logits'][0][0])
print(out2['logits'][0][0])
print(out1['logits'][0][0] == out2['logits'][0][0])

> tensor([345.0000, 348.2500, 354.2500,  ..., 206.2500, 206.2500, 206.2500],
       device='cuda:0', dtype=torch.float16, grad_fn=<SelectBackward0>)
> tensor([344.7500, 347.7500, 353.7500,  ..., 206.0000, 206.0000, 206.0000],
       device='cuda:0', dtype=torch.float16, grad_fn=<SelectBackward0>)
> tensor([False, False, False,  ..., False, False, False], device='cuda:0')

Expected behavior

The computation should be independent of the other batch elements, as for fp32 (see below):

import transformers

model = transformers.AutoModelForCausalLM.from_pretrained("bigscience/bloom-560m", load_in_8bit=False, device_map="auto").to("cuda")
tokenizer = transformers.AutoTokenizer.from_pretrained("bigscience/bloom-560m")
out1 = model(**tokenizer(["A"], return_tensors="pt").to("cuda"))
out2 = model(**tokenizer(["A"], ["B"], return_tensors="pt").to("cuda"))
print(out1['logits'][0][0])
print(out2['logits'][0][0])
print(out1['logits'][0][0] == out2['logits'][0][0])

> tensor([343.6242, 346.4580, 352.7924,  ..., 205.3806, 205.3800, 205.3746],
       grad_fn=<SelectBackward0>)
> tensor([343.6242, 346.4580, 352.7924,  ..., 205.3806, 205.3800, 205.3746],
       grad_fn=<SelectBackward0>)
> tensor([ True,  True,  True,  ...,  True,  True, False])

Edit 2023/03/22 Corrected the code for FP32.

sgugger commented 1 year ago

cc @younesbelkada

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.