huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.92k stars 26.99k forks source link

torch.onnx.export failure for llava's vision encoder model #33637

Closed symphonylyh closed 2 weeks ago

symphonylyh commented 1 month ago

System Info

transformers version == 4.42.4 works transformers version >= 4.43.0 all fails

Who can help?

No response

Information

Tasks

Reproduction

Steps to reproduce:

import os
import torch
from transformers import LlavaForConditionalGeneration, AutoProcessor
from PIL import Image

def export_onnx(model,
                input,
                onnx_dir,
                onnx_name='model.onnx',
                input_names=['input'],
                output_names=['output'],
                dynamic_axes={'input': {
                    0: 'batch'
                }}):
    os.makedirs(onnx_dir, exist_ok=True)
    torch.onnx.export(model,
                      input,
                      f'{onnx_dir}/{onnx_name}',
                      opset_version=17,
                      input_names=input_names,
                      output_names=output_names,
                      dynamic_axes=dynamic_axes)

class LlavaVisionWrapper(torch.nn.Module):

    def __init__(self, tower, projector, feature_layer):
        super().__init__()
        self.tower = tower
        self.projector = projector
        self.feature_layer = feature_layer

    def forward(self, image):
        all_hidden_states = self.tower(
            image, output_hidden_states=True).hidden_states
        features = all_hidden_states[self.feature_layer][:, 1:]
        return self.projector(features)

model_id = "llava-hf/llava-1.5-7b-hf" 
model = LlavaForConditionalGeneration.from_pretrained(model_id)
wrapper = LlavaVisionWrapper(
model.vision_tower,
model.multi_modal_projector,
model.config.vision_feature_layer)

processor = AutoProcessor.from_pretrained(model_id)
raw_image = Image.new('RGB', [10, 10])  # dummy image
image = processor(text="dummy", images=raw_image,
                  return_tensors="pt")['pixel_values']

export_onnx(wrapper, image, 'tmp/onnx')

Leads to error

line 116, in export_onnx
    torch.onnx.export(model,
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 511, in export
    _export(
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 1607, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 1133, in _model_to_graph
    graph = _optimize_graph(
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 672, in _optimize_graph
    graph = _C._jit_pass_onnx(graph, operator_export_type)
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 1956, in _run_symbolic_function
    return symbolic_fn(graph_context, *inputs, **attrs)
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/symbolic_helper.py", line 291, in wrapper
    return fn(g, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/symbolic_opset14.py", line 176, in scaled_dot_product_attention
    query_scaled = g.op("Mul", query, g.op("Sqrt", scale))
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 92, in op
    return _add_op(self, opname, *raw_args, outputs=outputs, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 243, in _add_op
    inputs = [_const_if_tensor(graph_context, arg) for arg in args]
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 243, in <listcomp>
    inputs = [_const_if_tensor(graph_context, arg) for arg in args]
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 275, in _const_if_tensor
    return _add_op(graph_context, "onnx::Constant", value_z=arg)
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 251, in _add_op
    node = _create_node(
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 311, in _create_node
    _add_attribute(node, key, value, aten=aten)
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 362, in _add_attribute
    return getattr(node, f"{kind}_")(name, value)
TypeError: z_(): incompatible function arguments. The following argument types are supported:
    1. (self: torch._C.Node, arg0: str, arg1: torch.Tensor) -> torch._C.Node

Only occurs >= 4.43.0

Expected behavior

onnx export should work

LysandreJik commented 1 month ago

Thanks @symphonylyh! Pinging @xenova for a quick answer when he can

xenova commented 1 month ago

Hi @symphonylyh 👋 We've found that exporting llava models is a bit more complicated than simply calling torch.onnx.export, mainly because we need to fuse the vision and text embedding before running the decoder. For that reason, we export 3 sub modules:

  1. Vision encoder
  2. Text embedding layer
  3. Decoder without embedding layer

Here's a colab notebook which outlines this process: https://colab.research.google.com/drive/1IhC8YOV68cze0XWGfuqSclnVTt_FskUd?usp=sharing

Hopefully that helps! One day, we'll add this to Optimum, but we were waiting for the VLM API to be a bit more standardized (which it's now in a much better state),

symphonylyh commented 1 month ago

Hi @xenova, thanks for the advice!

Actually I'm not referring to exporting the entire llava model, instead we're doing a pretty similar thing in your colab, which exports just the vision encoder + projector + feature layer as a onnx: https://colab.research.google.com/drive/1IhC8YOV68cze0XWGfuqSclnVTt_FskUd#scrollTo=qbZWrlAvR6VI&line=4&uniqifier=1. So it's just (1) in your above workflow

Such partial export works for 4.42.4 but fails on >= 4.43.0, so it's likely a regression. Since your colab seems to work for (1), maybe I can check whether it's some differences in the torch.onnx.export() params that causes this, or it's the differences in creating the dummy onnx input

justinchuby commented 1 month ago

The particular torch.onnx error here is fixed in PyTorch 2.5.

symphonylyh commented 1 month ago

@justinchuby Thanks for your advice!

I just tested in a pytorch 2.5 container, but still facing the same error.

This is my simple steps:

# this is the pytorch 2.5.0 container
docker run --gpus all --rm -it --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/pytorch:24.08-py3

# any version <= 4.42.4 is good. any version >= 4.43.0 fails
pip install transformers

# copy the above snippet into a test.py
python test.py

# I saw the same error

can you please share your onnx conversion steps, if you have encountered the same error and resolved by upgrading to pytorch 2.5?

symphonylyh commented 1 month ago

@xenova I found why you didn't encounter the error I have:

  1. I saw you're pointing to a fork of transformers, so I removed !pip install --upgrade git+https://github.com/zucchini-nlp/transformers.git@llava-onevision in order to test the official 4.43.0 version
  2. You're using a variant of llava model, I change it to model_id = "llava-hf/llava-1.5-7b-hf" to test the official llava-1.5 model.

What I found: Your qwen-0.5b model works fine, the official llava-1.5-7b won't work -- their architectures are different. Maybe one uses scaled_dot_product_attention but the other one doesn't.

Unfortunately the 7b model is not allowed to run in colab notebook due to memory constraint, so I cannot share a reproduction notebook with you. But if you have an offline machine, just change model_id = "llava-hf/llava-1.5-7b-hf" and the w, h = 336, 336 you will be able to see HF recent versions does fail for llava's vision encoder onnx export. Or more easily, you can reproduce using my snippet above to switch between qwen-0.5b and llava-1.5-7b

In this case, can we confirm it is a HF regression bug (since 4.43.0)?

justinchuby commented 1 month ago

How did you build torch 2.5? Is it the nightly build? Can you try the latest nightly build as well?

symphonylyh commented 1 month ago

How did you build torch 2.5? Is it the nightly build? Can you try the latest nightly build as well?

@justinchuby I was using the NVIDIA NGC container which has pytorch 2.5.0 built-in: https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-24-08.html#rel-24-08

I actually also tried the nightly build via pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124, but it complains some HF undefined symbol. May I know how you tested the nightly build? Are you using a docker container?

justinchuby commented 1 month ago

The commit is too old. (3 months ago). The fix was only cherry-picked two weeks ago so you need a newer version. I didn’t test this particular model. I just know that the particular issue referenced above was fixed.

github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.