huggingface / optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
https://huggingface.co/docs/optimum/main/
Apache License 2.0
2.59k stars 475 forks source link

StableLM cannot be converted with post processing turned on because the `ByteSize()` function returns only 1.4gb #1044

Open PatriceVignola opened 1 year ago

PatriceVignola commented 1 year ago

System Info

Optimum==1.8.4
Python==3.9

Who can help?

No response

Information

Tasks

Reproduction

When running the following command, the post-processing step fails because the model is bigger than 2gb (the weights are around 30gb), but somehow the ByteSize() function only returns 1.4gb, which leads to the post-processing logic to take the ModelProto path instead of the "model path" path.

optimum-cli export onnx --model=stabilityai/stablelm-tuned-alpha-7b stablelm_tuned_alpha_7b_onnx

Error:

Traceback (most recent call last):
  File "C:\Users\pavignol\Miniconda3\envs\ort-cuda\lib\site-packages\optimum\exporters\onnx\config.py", line 111, in post_process_exported_models
    merge_decoders(
  File "C:\Users\pavignol\Miniconda3\envs\ort-cuda\lib\site-packages\optimum\onnx\graph_transformations.py", line 226, in merge_decoders
    raise e
  File "C:\Users\pavignol\Miniconda3\envs\ort-cuda\lib\site-packages\optimum\onnx\graph_transformations.py", line 221, in merge_decoders
    onnx.checker.check_model(merged_model)
  File "C:\Users\pavignol\Miniconda3\envs\ort-cuda\lib\site-packages\onnx\checker.py", line 128, in check_model
    model if isinstance(model, bytes) else model.SerializeToString()
ValueError: Message onnx.ModelProto exceeds maximum protobuf size of 2GB: 31495912603

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\pavignol\Miniconda3\envs\ort-cuda\lib\site-packages\optimum\exporters\onnx\__main__.py", line 334, in main_export
    models_and_onnx_configs, onnx_files_subpaths = onnx_config.post_process_exported_models(
  File "C:\Users\pavignol\Miniconda3\envs\ort-cuda\lib\site-packages\optimum\exporters\onnx\config.py", line 117, in post_process_exported_models
    raise Exception(f"Unable to merge decoders. Detailed error: {e}")
Exception: Unable to merge decoders. Detailed error: Message onnx.ModelProto exceeds maximum protobuf size of 2GB: 31495912603

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\pavignol\Miniconda3\envs\ort-cuda\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\pavignol\Miniconda3\envs\ort-cuda\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\pavignol\Miniconda3\envs\ort-cuda\Scripts\optimum-cli.exe\__main__.py", line 7, in <module>
  File "C:\Users\pavignol\Miniconda3\envs\ort-cuda\lib\site-packages\optimum\commands\optimum_cli.py", line 163, in main
    service.run()
  File "C:\Users\pavignol\Miniconda3\envs\ort-cuda\lib\site-packages\optimum\commands\export\onnx.py", line 219, in run
    main_export(
  File "C:\Users\pavignol\Miniconda3\envs\ort-cuda\lib\site-packages\optimum\exporters\onnx\__main__.py", line 338, in main_export
    raise Exception(
Exception: The post-processing of the ONNX export failed. The export can still be performed by passing the option --no-post-process. Detailed error: Unable to merge decoders. Detailed error: Message onnx.ModelProto exceeds maximum protobuf size of 2GB: 31495912603

Expected behavior

I expect the post processing step to take the "model path" branch instead of the "ModelProto" branch since the model is clearly bigger than 2gb. I also expect the ByteSize() function to return the total size of the weights (~30gb), but it only returns 1.4gb.

regisss commented 1 year ago

Hi @PatriceVignola! I couldn't reproduce this error with:

optimum==1.8.6
transformers==4.29.2
onnx==1.13.1
onnxruntime==1.14.1

Have you tried again since you posted this issue?

typicaldigital commented 1 year ago

Dear @regisss @PatriceVignola

I get another error using these versions:

optimum-cli export onnx --model stabilityai/stablelm-tuned-alpha-7b stablelm-tuned-alpha-7b_onnx/

ERROR: Detailed error: Unable to merge decoders. Detailed error: Data of TensorProto ( tensor name: gpt_neox.embed_in.weight_merged_0) should be stored in decoder_model_merged.onnx_data, but it doesn't exist or is not accessible.

regisss commented 1 year ago

@typicaldigital I couldn't reproduce your issue on Ubuntu 20.04

typicaldigital commented 1 year ago

@regisss Thank you so much for the update. Will try on Ubuntu. :)

fxmarty commented 1 year ago

Hi @typicaldigital , Windows is not well tested in our CI. I have no issue running on ubuntu. It may be possible it is a bug from ONNX on Windows and not Optimum.

claeyzre commented 1 year ago

@typicaldigital did you find a way to make it work on Windows ? I'm facing the same issue.

fxmarty commented 1 year ago

@claeyzre Temporarily, you can always disable post-processing with --no-post-process if you'd just like the export to succeed.

typicaldigital commented 1 year ago

Nope, unfortunately not, I really hope this can be rectified soon. 😊

From: claeyzre @.> Sent: Saturday, June 24, 2023 12:59 AM To: huggingface/optimum @.> Cc: typicaldigital @.>; Mention @.> Subject: Re: [huggingface/optimum] StableLM cannot be converted with post processing turned on because the ByteSize() function returns only 1.4gb (Issue #1044)

@typicaldigital https://github.com/typicaldigital did you find a way to make it work on Windows ? I'm facing the same issue.

— Reply to this email directly, view it on GitHub https://github.com/huggingface/optimum/issues/1044#issuecomment-1604404474 , or unsubscribe https://github.com/notifications/unsubscribe-auth/A6D4DZDCIZMDFQR6OMRMKVTXMWVLJANCNFSM6AAAAAAX3BE5AI . You are receiving this because you were mentioned. https://github.com/notifications/beacon/A6D4DZETRHRZBAETGQ3FO7DXMWVLJA5CNFSM6AAAAAAX3BE5AKWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTS7UFCPU.gif Message ID: @. @.> >

claeyzre commented 1 year ago

@claeyzre Temporarily, you can always disable post-processing with --no-post-process if you'd just like the export to succeed.

I kind of did that. A following onnxruntime inference will serve as a poor man "check_model" in my case.

fxmarty commented 1 year ago

You should probably open an issue in the ONNX repo about ByteSize() with the model in copy.

fxmarty commented 1 year ago

Just saw the comment https://github.com/microsoft/Olive/blob/697948c2a1f7fe938609e1c97060d17f255c322e/olive/passes/onnx/optimum_merging.py#L44-L49 @PatriceVignola. I'll add an option to always use onnx.checker.check_model with the model path rather than the proto.

PatriceVignola commented 1 year ago

Hi @fxmarty !

I just came back from vacation and caught up on this. Is there any progress on adding this option to Optimum?

Thanks!