Open solomonmanuelraj opened 9 months ago
Hi @solomonmanuelraj, the following is working:
from PIL import Image
from transformers import OwlViTProcessor, OwlViTForObjectDetection
model_id = "google/owlvit-base-patch16"
owlbit8_model = OwlViTForObjectDetection.from_pretrained(model_id, device_map="auto", load_in_8bit=False)
owlbit8_model.save_pretrained("owlvit", save_config=True, safe_serialization=False)
and optimum-cli export onnx -m owlvit --task 'zero-shot-object-detection' --framework 'pt' owlvit_onnx
.
You can also directly use optimum-cli export onnx -m google/owlvit-base-patch16 --task 'zero-shot-object-detection' --framework 'pt' owlvit_onnx
Note that bitsandbytes modules (like Linear8bitLt
) can currently not be exported to ONNX.
Using safe_serialization=True
+ device_map="auto"
does not work for me.
Hi team,
thanks for your quick response.
I want to qunatize the model with 8 bit.
in this case load_in_8bit=False will give the 8 bit quantized model ? this option gives the model size around 594 MB. It is not quantized ( it gives only config.json file and pytorch_model.bin files only no .onnx output file) . I want to reduce the model size by 8 bits quantization so that i can deploy in the edge devices.
i think only "load_in_8bit=True" will give the 8 bit quantization model. correct?
i want to export the 8 bit quantized model owlvit model into onnx format.
Thank you @solomonmanuelraj. load_in_8bit=True
is not the only option available to use quantization. This argument specifically uses the quantization scheme from the bitsandbytes library, which can not be exported to ONNX from PyTorch. Reference for this: https://huggingface.co/blog/hf-bitsandbytes-integration
An alternative for you could be to use a classic A8W8 dynamic quantization scheme (all activations on 8 bits, all weights on 8 bits), using ONNX Runtime quantizer available through Optimum:
optimum-cli export onnx -m google/owlvit-base-patch16 --task 'zero-shot-object-detection' --framework 'pt' owlvit_onnx
optimum-cli onnxruntime quantize --onnx_model owlvit_onnx --avx512 -o owlvit_onnx_quantized --per_channel
reducing the model size to ~155 MB. However, you still need to validate the accuracy of the quantized model. Note that this command line by default quantizes all possible ops (https://github.com/microsoft/onnxruntime/blob/7cb8b20db2d329cf67e170293b2d2c81213e6100/onnxruntime/python/tools/quantization/registry.py#L26-L86), which may be too aggressive. This can be customized when doing the export programmatically (example in the link below).
References: https://huggingface.co/docs/optimum/main/en/concept_guides/quantization & https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/quantization
Thanks a lot for your quick response.
it was very useful.
yes. i need to verify the accuracy. the quantized model will be poor in accuracy.
Problem statement : my overall aim is quantizing and do the parameter efficient finetuning so that the model can be deployed in the edge device and the accuracy gap will be reduced compared to the full finetuning with custom dataset. ( e.g. quantization aware parameter efficient finetuning).
whether https://github.com/yxli2123/LoftQ and https://github.com/yuhuixu1993/qa-lora can be used for owl-vit vision models quantization and finetuning ? all the github examples available are only for LLM models. No reference available for vision models.
your reference and feedback will be helpful.
with thanks
@solomonmanuelraj I see. I am not sure about your problem. If at the end of the day your goal is to obtain a quantized ONNX model, your best bet would probably be:
If quantization was simulated in steps 1/2, you would need to make sure that the quantization scheme used by ORT is not completely different than what is used during QAT.
@fxmarty when i convert the owl-vit model into onnx file ( using optimum-cli export onnx -m google/owlvit-base-patch16 --task 'zero-shot-object-detection' --framework 'pt' owlvit_onnx) default opset is 14 it is taking ( https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model) .
My edge device supports only opset 12. Is there any option to rectify this issue?
############################################ File "/home/lfo2kor/miniconda3/envs/testenv/lib/python3.10/site-packages/optimum/exporters/onnx/main.py", line 499, in main_export raise ValueError( ValueError: Opset 12 is not sufficient to export owlvit. At least 14 is required.
#################################################################
thanks
Hi @solomonmanuelraj, if you use optimum dev branch (cloning from github and installing locally), this is fixed with https://github.com/huggingface/optimum/pull/1650. You will be able to export the model with an opset lower than what is specified in the onnx configuration.
System Info
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction (minimal, reproducible, runnable)
it gives filtered_dict[k] = v.detach() AttributeError: 'str' object has no attribute 'detach'
Expected behavior
onnx file