Closed leonweber closed 1 year ago
cc @younesbelkada
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
transformers
version: cf0af9a31beb84e8feec77af51f72d063ba905aabitsandbytes
version: 0.37.1Who can help?
@sgugger @muell
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
The outputs of a model for a given batch element depend on the other elements in the batch when using int8 inference. See minimal example below. I'm not sure whether this is expected?
Expected behavior
The computation should be independent of the other batch elements, as for fp32 (see below):
Edit 2023/03/22 Corrected the code for FP32.