huggingface / optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
https://huggingface.co/docs/optimum/main/
Apache License 2.0
2.44k stars 432 forks source link

Add LLava ONNX export has a problem #1873

Open Pengjie-W opened 3 months ago

Pengjie-W commented 3 months ago

System Info

optimum==1.19.0.dev0
torch==2.1.2
onnx==1.16.0
onnxruntime==1.18.0
cuda==11.8
optimum from mht-sharma:add_llava

Who can help?

@mht-sharma @xenova

Information

Tasks

Reproduction (minimal, reproducible, runnable)

code: optimum-cli export onnx --model llava-hf/llava-1.5-7b-hf llava_onnx/ --task image-to-text-with-past --trust-remote-code error: Traceback (most recent call last): File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/optimum/exporters/onnx/convert.py", line 577, in export_pytorch onnx_export( File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/onnx/utils.py", line 516, in export _export( File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/onnx/utils.py", line 1596, in _export graph, params_dict, torch_out = _model_to_graph( File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/onnx/utils.py", line 1135, in _model_to_graph graph, params, torch_out, module = _create_jit_graph(model, args) File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/onnx/utils.py", line 1011, in _create_jit_graph graph, torch_out = _trace_and_get_graph_from_model(model, args) File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/onnx/utils.py", line 915, in _trace_and_get_graph_from_model trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph( File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/jit/_trace.py", line 1285, in _get_trace_graph outs = ONNXTracedModule( File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/jit/_trace.py", line 133, in forward graph, out = torch._C._create_graph_by_tracing( File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/jit/_trace.py", line 124, in wrapper outs.append(self.inner(trace_inputs)) File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1508, in _slow_forward result = self.forward(*input, kwargs) File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/optimum/exporters/onnx/model_patcher.py", line 589, in patched_forward outputs = model.language_model( File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1183, in forward outputs = self.model( File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, **kwargs) File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1035, in forward attention_mask = _prepare_4d_causal_attention_mask_for_sdpa( File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py", line 398, in _prepare_4d_causal_attention_mask_for_sdpa expanded_4d_mask = attn_mask_converter.to_4d( File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py", line 137, in to_4d expanded_attn_mask = causal_4d_mask.masked_fill(expanded_attn_mask.bool(), torch.finfo(dtype).min) RuntimeError: The size of tensor a (4112) must match the size of tensor b (32) at non-singleton dimension 3 python-BaseException

Expected behavior

We hope to solve this problem and let llava successfully transfer to onnx.

Pengjie-W commented 3 months ago

At present, only the encoder is transferred and the error exit is reported

zhangyu68 commented 2 weeks ago

我用这个分支导出成功了,环境是: onnx 1.16.1 onnxruntime-gpu 1.18.1 opencv-python 4.10.0.84 openpyxl 3.1.3 optimum 1.20.0.dev0 cuda 12.1

image