huggingface / optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
https://huggingface.co/docs/optimum/main/
Apache License 2.0
2.46k stars 436 forks source link

Merging of ONNX decoder >2GB fails #894

Closed fxmarty closed 1 year ago

fxmarty commented 1 year ago

System Info

optimum main

Who can help?

No response

Information

Tasks

Reproduction

optimum-cli export onnx --model gpt2-large gpt2_onnx

Traceback:

(fx) felix@hf-dgx-01:~/optimum$ optimum-cli export onnx --model gpt2-large gpt2_onnx
Framework not specified. Using pt to export to ONNX.
Automatic task detection to causal-lm-with-past.
use_past = False is different than use_present_in_outputs = True, the value of use_present_in_outputs value will be used for the outputs.
Using framework PyTorch: 2.1.0.dev20230306+cu117
Overriding 2 configuration item(s)
        - use_cache -> True
        - pad_token_id -> 0
/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py:794: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if batch_size <= 0:
======= Diagnostic Run torch.onnx.export version 2.1.0.dev20230306+cu117 =======
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

Saving external data to one file...
Using framework PyTorch: 2.1.0.dev20230306+cu117
Overriding 2 configuration item(s)
        - use_cache -> True
        - pad_token_id -> 0
Asked a sequence length of 16, but a sequence length of 1 will be used with use_past == True for `input_ids`.
======= Diagnostic Run torch.onnx.export version 2.1.0.dev20230306+cu117 =======
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

Saving external data to one file...
Asked a sequence length of 16, but a sequence length of 1 will be used with use_past == True for `input_ids`.
Traceback (most recent call last):
  File "/home/felix/optimum/optimum/exporters/onnx/config.py", line 111, in post_process_exported_models
    merge_decoders(
  File "/home/felix/optimum/optimum/onnx/graph_transformations.py", line 237, in merge_decoders
    raise e
  File "/home/felix/optimum/optimum/onnx/graph_transformations.py", line 232, in merge_decoders
    onnx.checker.check_model(merged_model)
  File "/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/onnx/checker.py", line 106, in check_model
    C.check_model(protobuf_string)
onnx.onnx_cpp2py_export.checker.ValidationError: Data of TensorProto ( tensor name: transformer.wte.weight_merged_0) should be stored in decoder_model_merged.onnx_data, but it doesn't exist or is not accessible.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/felix/optimum/optimum/exporters/onnx/__main__.py", line 218, in main
    models_and_onnx_configs, onnx_files_subpaths = onnx_config.post_process_exported_models(
  File "/home/felix/optimum/optimum/exporters/onnx/config.py", line 117, in post_process_exported_models
    raise Exception(f"Unable to merge decoders. Detailed error: {e}")
Exception: Unable to merge decoders. Detailed error: Data of TensorProto ( tensor name: transformer.wte.weight_merged_0) should be stored in decoder_model_merged.onnx_data, but it doesn't exist or is not accessible.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/felix/miniconda3/envs/fx/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/felix/miniconda3/envs/fx/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/felix/optimum/optimum/exporters/onnx/__main__.py", line 255, in <module>
    main()
  File "/home/felix/optimum/optimum/exporters/onnx/__main__.py", line 222, in main
    raise Exception(
Exception: The post-processing of the ONNX export failed. The export can still be performed by passing the option --no-post-process. Detailed error: Unable to merge decoders. Detailed error: Data of TensorProto ( tensor name: transformer.wte.weight_merged_0) should be stored in decoder_model_merged.onnx_data, but it doesn't exist or is not accessible.

Expected behavior

no error

vilsonrodrigues commented 1 year ago

Hi. I'm using the cli command:

optimum-cli export onnx --model openai/whisper-medium model/

and getting the same error:

ValueError: This protobuf of onnx model is too large (>2GB). Call check_model with model path instead.

Environment:

Optimum version tested:

fxmarty commented 1 year ago

@vilsonrodrigues This is fixed on main, thanks for notifying!

vilsonrodrigues commented 1 year ago

Thanks @fxmarty!!

typicaldigital commented 1 year ago

Dear @fxmarty

I get a similar error using these versions:

optimum version: 1.8.7 transformers version: 4.29.2 Platform: Windows-10-10.0.22621-SP0 Python version: 3.11.4 Huggingface_hub version: 0.15.1 PyTorch version (GPU?): 2.1.0.dev20230611+cu121 (cuda availabe: True) Tensorflow version (GPU?): not installed (cuda availabe: NA)

optimum-cli export onnx --model stabilityai/stablelm-tuned-alpha-7b stablelm-tuned-alpha-7b_onnx/

ERROR: Detailed error: Unable to merge decoders. Detailed error: Data of TensorProto ( tensor name: gpt_neox.embed_in.weight_merged_0) should be stored in decoder_model_merged.onnx_data, but it doesn't exist or is not accessible.

fxmarty commented 1 year ago

Thanks, tracked in https://github.com/huggingface/optimum/issues/1044