You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the __call__ method is faster than using a method to encode the text followed by a call to the pad method to get a padded encoding.
warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the
__call__
method is faster than using a method to encode the text followed by a call to thepad
method to get a padded encoding.warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")