Open Hongao0611 opened 3 weeks ago
I tried loading a quantized model, but it failed:
ValueError:
.to
is not supported for4-bit
or8-bit
bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correctdtype
.
Hi, If I intend to compute the perplexity for a quantized model, what parameters should I pass to the .compute() function?
For example,
Much thx!