huggingface / optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
https://huggingface.co/docs/optimum/main/
Apache License 2.0
2.54k stars 456 forks source link

Support bitsandbytes export to ONNX #1633

Open solomonmanuelraj opened 9 months ago

solomonmanuelraj commented 9 months ago

System Info

Hi Team,

Need your help to convert owl-vit model (OwlViTForObjectDetection) into onnx file.

##########################################################################################
from PIL import Image
from transformers import OwlViTProcessor, OwlViTForObjectDetection

model_id = "google/owlvit-base-patch16"
owlbit8_model = OwlViTForObjectDetection.from_pretrained(model_id,device_map="auto",load_in_8bit=True)
owlbit8_model.save_pretrained("local file system - dir path",
save_config=True, safe_serialization=True)

#########################################################################################

Output in the local file system - dir path
config.json
model.safetensors

#########################################################################################

optimum-cli export onnx --model 'local file system - dir path' --task 'zero-shot-object-detection' --framework 'pt' output_dir

###########################################################################################
receiving the following error msgs.

##########################################################################################

Using the export variant default. Available variants are:

default: The default ONNX variant.
Using framework PyTorch: 2.1.0
Traceback (most recent call last):
File "/home/..../miniconda3/envs/testenv/bin/optimum-cli", line 8, in
sys.exit(main())
File "/home/..../miniconda3/envs/testenv/lib/python3.10/site-packages/optimum/commands/optimum_cli.py", line 163, in main
service.run()
File "/home/..../miniconda3/envs/testenv/lib/python3.10/site-packages/optimum/commands/export/onnx.py", line 246, in run
main_export(
File "/home/..../miniconda3/envs/testenv/lib/python3.10/site-packages/optimum/exporters/onnx/main.py", line 551, in main_export
_, onnx_outputs = export_models(
File "/home/..../miniconda3/envs/testenv/lib/python3.10/site-packages/optimum/exporters/onnx/convert.py", line 753, in export_models
export(
File "/home/..../miniconda3/envs/testenv/lib/python3.10/site-packages/optimum/exporters/onnx/convert.py", line 856, in export
export_output = export_pytorch(
File "/home/..../miniconda3/envs/testenv/lib/python3.10/site-packages/optimum/exporters/onnx/convert.py", line 573, in export_pytorch
onnx_export(
File "/home/..../miniconda3/envs/testenv/lib/python3.10/site-packages/torch/onnx/utils.py", line 516, in export
_export(
File "/home/..../miniconda3/envs/testenv/lib/python3.10/site-packages/torch/onnx/utils.py", line 1596, in _export
graph, params_dict, torch_out = _model_to_graph(
File "/home/..../miniconda3/envs/testenv/lib/python3.10/site-packages/torch/onnx/utils.py", line 1135, in _model_to_graph
graph, params, torch_out, module = _create_jit_graph(model, args)
File "/home..../miniconda3/envs/testenv/lib/python3.10/site-packages/torch/onnx/utils.py", line 1011, in _create_jit_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args)
File "/home/..../miniconda3/envs/testenv/lib/python3.10/site-packages/torch/onnx/utils.py", line 907, in _trace_and_get_graph_from_model
orig_state_dict_keys = torch.jit._unique_state_dict(model).keys()
File "/home/..../miniconda3/envs/testenv/lib/python3.10/site-packages/torch/jit/_trace.py", line 76, in _unique_state_dict
filtered_dict[k] = v.detach()
AttributeError: 'str' object has no attribute 'detach'

Who can help?

No response

Information

Tasks

Reproduction (minimal, reproducible, runnable)

it gives filtered_dict[k] = v.detach() AttributeError: 'str' object has no attribute 'detach'

Expected behavior

onnx file

fxmarty commented 9 months ago

Hi @solomonmanuelraj, the following is working:

from PIL import Image
from transformers import OwlViTProcessor, OwlViTForObjectDetection

model_id = "google/owlvit-base-patch16"
owlbit8_model = OwlViTForObjectDetection.from_pretrained(model_id, device_map="auto", load_in_8bit=False)
owlbit8_model.save_pretrained("owlvit", save_config=True, safe_serialization=False)

and optimum-cli export onnx -m owlvit --task 'zero-shot-object-detection' --framework 'pt' owlvit_onnx.

You can also directly use optimum-cli export onnx -m google/owlvit-base-patch16 --task 'zero-shot-object-detection' --framework 'pt' owlvit_onnx

Note that bitsandbytes modules (like Linear8bitLt) can currently not be exported to ONNX.

Using safe_serialization=True + device_map="auto" does not work for me.

solomonmanuelraj commented 9 months ago

Hi team,

thanks for your quick response.

I want to qunatize the model with 8 bit.

in this case load_in_8bit=False will give the 8 bit quantized model ? this option gives the model size around 594 MB. It is not quantized ( it gives only config.json file and pytorch_model.bin files only no .onnx output file) . I want to reduce the model size by 8 bits quantization so that i can deploy in the edge devices.

i think only "load_in_8bit=True" will give the 8 bit quantization model. correct?

i want to export the 8 bit quantized model owlvit model into onnx format.

fxmarty commented 9 months ago

Thank you @solomonmanuelraj. load_in_8bit=True is not the only option available to use quantization. This argument specifically uses the quantization scheme from the bitsandbytes library, which can not be exported to ONNX from PyTorch. Reference for this: https://huggingface.co/blog/hf-bitsandbytes-integration

An alternative for you could be to use a classic A8W8 dynamic quantization scheme (all activations on 8 bits, all weights on 8 bits), using ONNX Runtime quantizer available through Optimum:

optimum-cli export onnx -m google/owlvit-base-patch16 --task 'zero-shot-object-detection' --framework 'pt' owlvit_onnx
optimum-cli onnxruntime quantize --onnx_model owlvit_onnx --avx512 -o owlvit_onnx_quantized --per_channel

reducing the model size to ~155 MB. However, you still need to validate the accuracy of the quantized model. Note that this command line by default quantizes all possible ops (https://github.com/microsoft/onnxruntime/blob/7cb8b20db2d329cf67e170293b2d2c81213e6100/onnxruntime/python/tools/quantization/registry.py#L26-L86), which may be too aggressive. This can be customized when doing the export programmatically (example in the link below).

References: https://huggingface.co/docs/optimum/main/en/concept_guides/quantization & https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/quantization

solomonmanuelraj commented 9 months ago

Thanks a lot for your quick response.

it was very useful.

yes. i need to verify the accuracy. the quantized model will be poor in accuracy.

Problem statement : my overall aim is quantizing and do the parameter efficient finetuning so that the model can be deployed in the edge device and the accuracy gap will be reduced compared to the full finetuning with custom dataset. ( e.g. quantization aware parameter efficient finetuning).

whether https://github.com/yxli2123/LoftQ and https://github.com/yuhuixu1993/qa-lora can be used for owl-vit vision models quantization and finetuning ? all the github examples available are only for LLM models. No reference available for vision models.

your reference and feedback will be helpful.

with thanks

fxmarty commented 9 months ago

@solomonmanuelraj I see. I am not sure about your problem. If at the end of the day your goal is to obtain a quantized ONNX model, your best bet would probably be:

  1. Do fine-tuning / parameter efficient fine-tuning
  2. Merge adapter weights in the original model
  3. ONNX export
  4. Quantization using ORT quantization tools

If quantization was simulated in steps 1/2, you would need to make sure that the quantization scheme used by ORT is not completely different than what is used during QAT.

solomonmanuelraj commented 9 months ago

@fxmarty when i convert the owl-vit model into onnx file ( using optimum-cli export onnx -m google/owlvit-base-patch16 --task 'zero-shot-object-detection' --framework 'pt' owlvit_onnx) default opset is 14 it is taking ( https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model) .

My edge device supports only opset 12. Is there any option to rectify this issue?

############################################ File "/home/lfo2kor/miniconda3/envs/testenv/lib/python3.10/site-packages/optimum/exporters/onnx/main.py", line 499, in main_export raise ValueError( ValueError: Opset 12 is not sufficient to export owlvit. At least 14 is required.

#################################################################

thanks

fxmarty commented 9 months ago

Hi @solomonmanuelraj, if you use optimum dev branch (cloning from github and installing locally), this is fixed with https://github.com/huggingface/optimum/pull/1650. You will be able to export the model with an opset lower than what is specified in the onnx configuration.