huggingface / optimum-quanto

A pytorch quantization backend for optimum
Apache License 2.0
645 stars 36 forks source link

ValueError: The model is quantized with QuantizationMethod.QUANTO and is not serializable - check out the warnings from the logger on the traceback to understand the reason why the quantized model is not serializable. #188

Closed gospacedev closed 2 months ago

gospacedev commented 2 months ago

Hello, I've been having an error when I try to push my quantized model to huggingface, I don't know how to resolve it, is there another quantization method I can use that is complatible with save_to_hub?

This is the code:

from transformers import T5Tokenizer, T5ForConditionalGeneration, QuantoConfig

model_id = "google/flan-t5-base"

model = T5ForConditionalGeneration.from_pretrained(model_id)
tokenizer = T5Tokenizer.from_pretrained(model_id)

quantization_config = QuantoConfig(weights="int8")
quantized_model = T5ForConditionalGeneration.from_pretrained(model_id, low_cpu_mem_usage=True, quantization_config=quantization_config)

tokenizer.push_to_hub("flan-t5-base-8bit")
quantized_model.push_to_hub("flan-t5-base-8bit") # This is the line where I get the error

Here is the error:

ValueError: The model is quantized with QuantizationMethod.QUANTO and is not serializable - check out the warnings from the logger on the traceback to understand the reason why the quantized model is not serializable.
dacorvo commented 2 months ago

cc @SunMarc

SunMarc commented 2 months ago

Hi @gospacedev, quanto serialization is still a WIP. See my comment here for more information. You can check the list of quantization methods compatible with save_to_hub here.

gospacedev commented 2 months ago

Thank you @SunMarc for your reponse! I didn't know that quanto serialization was not yet available, thanks for the link, I'll be sure check it out.