Closed akkikiki closed 1 year ago
Thanks a lot for the issue @akkikiki ! What is the hardware you are using + bnb version?
Thanks a lot for the reply! The hardware is 8 V100 (16GB) GPUs and the bnb version is 0.37.0.
I think sadly there is indeed an issue with V100 right now as stated by @TimDettmers here: https://github.com/huggingface/transformers/pull/21955#issuecomment-1455235281 It should be fixed somehow soon, also as stated in this comment, more universal methods (that cover most of GPU hardware) should be published soon!
Thanks @younesbelkada! Interesting, so some smart workaround for GPUs without hardware-level support on int8.
FYI, I actually played around with BitsAndBytesConfig
, and seems like quantization_config = BitsAndBytesConfig(llm_int8_threshold=5.0)
resolved the issue.
Output result with quantization_config = BitsAndBytesConfig(llm_int8_threshold=5.0)
:
<pad> A Haiku is a Japanese poetry form that uses a 5-7-5 syllable structure. A typical tweet is limited to 140 characters. The answer is no.</s>
Will just close this thread for now. Thanks again for the heads up on V100 issue!
This is great! Thanks for the advice! Would you mind posting it in #21955 so that people can be aware of this hack π ?
This is great! Thanks for the advice! Would you mind posting it in #21955 so that people can be aware of this hack π ?
Will do!
Thanks a lot @akkikiki ! Much apprciated!
System Info
transformers
version: 4.27.0.dev0Who can help?
@younesbelkada
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
When input texts are short, the generated texts look good. But when input texts are long e.g., the following, then it produces tokens.
Input
Output:
Expected behavior
This is the result when loaded with
load_in_8bit=False