Closed symphonylyh closed 2 weeks ago
Thanks @symphonylyh! Pinging @xenova for a quick answer when he can
Hi @symphonylyh 👋 We've found that exporting llava models is a bit more complicated than simply calling torch.onnx.export, mainly because we need to fuse the vision and text embedding before running the decoder. For that reason, we export 3 sub modules:
Here's a colab notebook which outlines this process: https://colab.research.google.com/drive/1IhC8YOV68cze0XWGfuqSclnVTt_FskUd?usp=sharing
Hopefully that helps! One day, we'll add this to Optimum, but we were waiting for the VLM API to be a bit more standardized (which it's now in a much better state),
Hi @xenova, thanks for the advice!
Actually I'm not referring to exporting the entire llava model, instead we're doing a pretty similar thing in your colab, which exports just the vision encoder + projector + feature layer as a onnx: https://colab.research.google.com/drive/1IhC8YOV68cze0XWGfuqSclnVTt_FskUd#scrollTo=qbZWrlAvR6VI&line=4&uniqifier=1. So it's just (1) in your above workflow
Such partial export works for 4.42.4 but fails on >= 4.43.0, so it's likely a regression. Since your colab seems to work for (1), maybe I can check whether it's some differences in the torch.onnx.export() params that causes this, or it's the differences in creating the dummy onnx input
The particular torch.onnx error here is fixed in PyTorch 2.5.
@justinchuby Thanks for your advice!
I just tested in a pytorch 2.5 container, but still facing the same error.
This is my simple steps:
# this is the pytorch 2.5.0 container
docker run --gpus all --rm -it --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/pytorch:24.08-py3
# any version <= 4.42.4 is good. any version >= 4.43.0 fails
pip install transformers
# copy the above snippet into a test.py
python test.py
# I saw the same error
can you please share your onnx conversion steps, if you have encountered the same error and resolved by upgrading to pytorch 2.5?
@xenova I found why you didn't encounter the error I have:
!pip install --upgrade git+https://github.com/zucchini-nlp/transformers.git@llava-onevision
in order to test the official 4.43.0 versionmodel_id = "llava-hf/llava-1.5-7b-hf"
to test the official llava-1.5 model.What I found: Your qwen-0.5b model works fine, the official llava-1.5-7b won't work -- their architectures are different. Maybe one uses scaled_dot_product_attention but the other one doesn't.
Unfortunately the 7b model is not allowed to run in colab notebook due to memory constraint, so I cannot share a reproduction notebook with you. But if you have an offline machine, just change model_id = "llava-hf/llava-1.5-7b-hf"
and the w, h = 336, 336
you will be able to see HF recent versions does fail for llava's vision encoder onnx export.
Or more easily, you can reproduce using my snippet above to switch between qwen-0.5b and llava-1.5-7b
In this case, can we confirm it is a HF regression bug (since 4.43.0)?
How did you build torch 2.5? Is it the nightly build? Can you try the latest nightly build as well?
How did you build torch 2.5? Is it the nightly build? Can you try the latest nightly build as well?
@justinchuby I was using the NVIDIA NGC container which has pytorch 2.5.0 built-in: https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-24-08.html#rel-24-08
I actually also tried the nightly build via pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124
, but it complains some HF undefined symbol. May I know how you tested the nightly build? Are you using a docker container?
The commit is too old. (3 months ago). The fix was only cherry-picked two weeks ago so you need a newer version. I didn’t test this particular model. I just know that the particular issue referenced above was fixed.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
transformers version == 4.42.4 works transformers version >= 4.43.0 all fails
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Steps to reproduce:
Leads to error
Only occurs >= 4.43.0
Expected behavior
onnx export should work