huggingface / optimum

šŸš€ Accelerate training and inference of šŸ¤— Transformers and šŸ¤— Diffusers with easy to use hardware optimization tools
https://huggingface.co/docs/optimum/main/
Apache License 2.0
2.49k stars 447 forks source link

Cannot export jinaai models to onnx format because the model is > 2Gb #1800

Open clarinevong opened 5 months ago

clarinevong commented 5 months ago

System Info

Optimum Version: 1.18.0
Python Version: 3.9
Platform: Windows, x86_64

Who can help?

@michaelbenayoun @JingyaHuang @echarlaix

I am writing to report an issue I encountered while attempting to export a jinaai model to ONNX format using Optimum.

Error message RuntimeError: The serialized model is larger than the 2GiB limit imposed by the protobuf library. Therefore the output file must be a file path, so that the ONNX external data can be written to the same directory. Please specify the output file name.

image

Information

Tasks

Reproduction (minimal, reproducible, runnable)

optimum-cli export onnx -m jinaai/jina-embeddings-v2-base-en jina-embeddings-v2-base-en-onnx --trust-remote-code

Expected behavior

I would expect Optimum to successfully export the jinaai model to ONNX format without encountering any errors or issues.

fxmarty commented 5 months ago

Hi @clarinevong, I can not reproduce the issue on Linux, this is likely a PyTorch x Windows bug. I would recommend opening a bug report in PyTorch repo (although the torchscript issues are a bit deprecated these days as the effort is moving to dynamo)

Related: https://github.com/microsoft/onnxscript/issues/493

Which PyTorch version are you using? This looks to me to be a bug here with GetGraphProtoSize. cc @xadupre maybe related to https://github.com/huggingface/optimum/issues/1642#issuecomment-1910294822 on Windows only

clarinevong commented 5 months ago

Thanks for the reply and the different suggestions. I was actually able to recreate the issue on Linux with

image

fxmarty commented 5 months ago

Thank you for giving a try on Linux! I still can not reproduce, using python 3.10.14 and

optimum==1.18.1
torch==2.2.2+cu118
transformers==4.39.3
onnx==1.15.0
onnxruntime==1.17.1

Could you share your pip freeze?

clarinevong commented 5 months ago

Yes of course

aiohttp==3.9.3
aiosignal==1.3.1
async-timeout==4.0.3
attrs==23.2.0
certifi==2024.2.2
charset-normalizer==3.3.2
coloredlogs==15.0.1
datasets==2.18.0
dill==0.3.8
filelock==3.13.4
flatbuffers==24.3.25
frozenlist==1.4.1
fsspec==2024.2.0
huggingface-hub==0.22.2
humanfriendly==10.0
idna==3.6
Jinja2==3.1.3
joblib==1.4.0
MarkupSafe==2.1.5
mpmath==1.3.0
multidict==6.0.5
multiprocess==0.70.16
networkx==3.3
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.1.105
onnx==1.16.0
onnxruntime==1.17.1
optimum==1.18.0
packaging==24.0
pandas==2.2.1
pillow==10.3.0
protobuf==5.26.1
pyarrow==15.0.2
pyarrow-hotfix==0.6
python-dateutil==2.9.0.post0
pytz==2024.1
PyYAML==6.0.1
regex==2023.12.25
requests==2.31.0
safetensors==0.4.2
scikit-learn==1.4.2
scipy==1.13.0
sentence-transformers==2.6.1
sentencepiece==0.2.0
six==1.16.0
sympy==1.12
threadpoolctl==3.4.0
timm==0.9.16
tokenizers==0.15.2
torch==2.2.2
torchvision==0.17.2
tqdm==4.66.2
transformers==4.39.3
triton==2.2.0
typing_extensions==4.11.0
tzdata==2024.1
urllib3==2.2.1
xxhash==3.4.1
yarl==1.9.4
amatanasov commented 3 weeks ago

I am hitting same issue with latest version of onnx, onnxruntime, optimum CC @echarlaix