justinchuby commented 5 months ago

torch.onnx.dynamo_export

Profiling

  _     ._   __/__   _ _  _  _ _/_   Recorded: 07:07:49  Samples:  10301
 /_//_/// /_\ / //_// / //_'/ //     Duration: 19.238    CPU time: 16.182
/   _/                      v4.6.2

Program: /Users/justinc/Documents/GitHub/torch-onnx/venv/bin/optimum-cli export onnx --model openai/whisper-large-v3 whisper_dynamo/

19.238 export_pytorch  optimum/exporters/onnx/convert.py:485
└─ 19.202 export  optimum/exporters/onnx/convert.py:584
      [242 frames hidden]  optimum, torch, <string>, contextlib,...
         11.934 Exporter.export  torch/onnx/_internal/exporter.py:1163
         ├─ 2.103 TorchScriptGraph.to_model_proto  ../../onnxscript/onnxscript/function_libs/torch_lib/graph_building/_graph_building_torch.py:1002
         │  ├─ 1.132 [self]  ../../onnxscript/onnxscript/function_libs/torch_lib/graph_building/_graph_building_torch.py
         │  └─ 0.746 load_external_data_for_model  onnx/external_data_helper.py:55
         │        [4 frames hidden]  onnx, <built-in>
         5.190 ModelProto.ByteSize  <built-in>

  _     ._   __/__   _ _  _  _ _/_   Recorded: 07:08:09  Samples:  18846
 /_//_/// /_\ / //_// / //_'/ //     Duration: 30.239    CPU time: 27.183
/   _/                      v4.6.2

Program: /Users/justinc/Documents/GitHub/torch-onnx/venv/bin/optimum-cli export onnx --model openai/whisper-large-v3 whisper_dynamo/

30.239 export_pytorch  optimum/exporters/onnx/convert.py:485
└─ 30.161 export  optimum/exporters/onnx/convert.py:584
      [296 frames hidden]  optimum, torch, <string>, contextlib,...
         21.395 Exporter.export  torch/onnx/_internal/exporter.py:1163
         ├─ 3.444 TorchScriptGraph.to_model_proto  ../../onnxscript/onnxscript/function_libs/torch_lib/graph_building/_graph_building_torch.py:1002
         │  ├─ 2.036 [self]  ../../onnxscript/onnxscript/function_libs/torch_lib/graph_building/_graph_building_torch.py
         │  └─ 1.016 load_external_data_for_model  onnx/external_data_helper.py:55
         │        [4 frames hidden]  onnx, <built-in>

  _     ._   __/__   _ _  _  _ _/_   Recorded: 07:08:41  Samples:  17082
 /_//_/// /_\ / //_// / //_'/ //     Duration: 26.246    CPU time: 24.735
/   _/                      v4.6.2

Program: /Users/justinc/Documents/GitHub/torch-onnx/venv/bin/optimum-cli export onnx --model openai/whisper-large-v3 whisper_dynamo/

26.245 export_pytorch  optimum/exporters/onnx/convert.py:485
└─ 26.189 export  optimum/exporters/onnx/convert.py:584
      [333 frames hidden]  optimum, torch, <string>, contextlib,...
         19.196 Exporter.export  torch/onnx/_internal/exporter.py:1163
         ├─ 2.708 TorchScriptGraph.to_model_proto  ../../onnxscript/onnxscript/function_libs/torch_lib/graph_building/_graph_building_torch.py:1002
         │  ├─ 1.394 [self]  ../../onnxscript/onnxscript/function_libs/torch_lib/graph_building/_graph_building_torch.py
         │  └─ 0.941 load_external_data_for_model  onnx/external_data_helper.py:55
         │        [4 frames hidden]  onnx, <built-in>

Memory profiling

mprof run optimum-cli export onnx --model openai/whisper-large-
v3 whisper_dynamo/ --no-post-process
mprof: Sampling memory every 0.1s
running new process
Framework not specified. Using pt to export the model.
/Users/justinc/Documents/GitHub/torch-onnx/venv/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Automatic task detection to automatic-speech-recognition-with-past (possible synonyms are: speech2seq-lm-with-past).
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Using the export variant default. Available variants are:
    - default: The default ONNX variant.
Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41.
Non-default generation parameters: {'max_length': 448, 'begin_suppress_tokens': [220, 50257]}
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

***** Exporting submodel 1/3: WhisperEncoder *****
Using framework PyTorch: 2.3.1
Overriding 1 configuration item(s)
        - use_cache -> False
/Users/justinc/Documents/GitHub/torch-onnx/venv/lib/python3.11/site-packages/torch/onnx/_internal/exporter.py:136: UserWarning: torch.onnx.dynamo_export only implements opset version 18 for now. If you need to use a different opset version, please register them with register_custom_op.
  warnings.warn(
Filename: /Users/justinc/Documents/GitHub/torch-onnx/venv/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   583   6307.0 MiB   6307.0 MiB           1               @memory_profiler.profile
   584                                                     def export():
   585   6307.0 MiB      0.0 MiB           1                   export_options = torch.onnx.ExportOptions(dynamic_shapes=False)
   586   8923.8 MiB   2616.8 MiB           4                   onnx_program = torch.onnx.dynamo_export(
   587   6307.0 MiB      0.0 MiB           1                       model,
   588   6307.0 MiB      0.0 MiB           1                       export_options = export_options,
   589   6307.0 MiB      0.0 MiB           1                       **dummy_inputs,
   590                                                         )
   591   2753.5 MiB  -6170.4 MiB           1                   onnx_program.save(output.as_posix())
   592                                                         # onnx_export(
   593                                                         #     model,
   594                                                         #     (dummy_inputs,),
   595                                                         #     f=output.as_posix(),
   596                                                         #     input_names=input_names,
   597                                                         #     output_names=output_names,
   598                                                         #     # dynamic_axes=dynamix_axes,
   599                                                         #     do_constant_folding=do_constant_folding,
   600                                                         #     opset_version=opset,
   601                                                         #     export_params=False, # MARK
   602                                                         # )

***** Exporting submodel 2/3: WhisperForConditionalGeneration *****
Using framework PyTorch: 2.3.1
Overriding 1 configuration item(s)
        - use_cache -> True
/Users/justinc/Documents/GitHub/torch-onnx/venv/lib/python3.11/site-packages/torch/onnx/_internal/exporter.py:136: UserWarning: torch.onnx.dynamo_export only implements opset version 18 for now. If you need to use a different opset version, please register them with register_custom_op.
  warnings.warn(
Filename: /Users/justinc/Documents/GitHub/torch-onnx/venv/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   583    932.3 MiB    932.3 MiB           1               @memory_profiler.profile
   584                                                     def export():
   585    932.4 MiB      0.1 MiB           1                   export_options = torch.onnx.ExportOptions(dynamic_shapes=False)
   586   8236.4 MiB   7304.0 MiB           4                   onnx_program = torch.onnx.dynamo_export(
   587    932.4 MiB      0.0 MiB           1                       model,
   588    932.4 MiB      0.0 MiB           1                       export_options = export_options,
   589    932.4 MiB      0.0 MiB           1                       **dummy_inputs,
   590                                                         )
   591   7460.6 MiB   -775.8 MiB           1                   onnx_program.save(output.as_posix())
   592                                                         # onnx_export(
   593                                                         #     model,
   594                                                         #     (dummy_inputs,),
   595                                                         #     f=output.as_posix(),
   596                                                         #     input_names=input_names,
   597                                                         #     output_names=output_names,
   598                                                         #     # dynamic_axes=dynamix_axes,
   599                                                         #     do_constant_folding=do_constant_folding,
   600                                                         #     opset_version=opset,
   601                                                         #     export_params=False, # MARK
   602                                                         # )

***** Exporting submodel 3/3: WhisperForConditionalGeneration *****
Using framework PyTorch: 2.3.1
Overriding 1 configuration item(s)
        - use_cache -> True
/Users/justinc/Documents/GitHub/torch-onnx/venv/lib/python3.11/site-packages/torch/onnx/_internal/exporter.py:136: UserWarning: torch.onnx.dynamo_export only implements opset version 18 for now. If you need to use a different opset version, please register them with register_custom_op.
  warnings.warn(
Filename: /Users/justinc/Documents/GitHub/torch-onnx/venv/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   583   4851.6 MiB   4851.6 MiB           1               @memory_profiler.profile
   584                                                     def export():
   585   4851.8 MiB      0.2 MiB           1                   export_options = torch.onnx.ExportOptions(dynamic_shapes=False)
   586   7379.3 MiB   2527.5 MiB           4                   onnx_program = torch.onnx.dynamo_export(
   587   4851.8 MiB      0.0 MiB           1                       model,
   588   4851.8 MiB      0.0 MiB           1                       export_options = export_options,
   589   4851.8 MiB      0.0 MiB           1                       **dummy_inputs,
   590                                                         )
   591   6747.5 MiB   -631.8 MiB           1                   onnx_program.save(output.as_posix())
   592                                                         # onnx_export(
   593                                                         #     model,
   594                                                         #     (dummy_inputs,),
   595                                                         #     f=output.as_posix(),
   596                                                         #     input_names=input_names,
   597                                                         #     output_names=output_names,
   598                                                         #     # dynamic_axes=dynamix_axes,
   599                                                         #     do_constant_folding=do_constant_folding,
   600                                                         #     opset_version=opset,
   601                                                         #     export_params=False, # MARK
   602                                                         # )

The ONNX export succeeded and the exported model was saved at: whisper_dynamo

mprofile_20240625071007.txt

justinchuby commented 5 months ago

torch_onnx dynamo improved w/ external tensor handling (theoretical)

Profiling

  _     ._   __/__   _ _  _  _ _/_   Recorded: 07:19:45  Samples:  11044
 /_//_/// /_\ / //_// / //_'/ //     Duration: 11.784    CPU time: 13.683
/   _/                      v4.6.2

Program: /Users/justinc/Documents/GitHub/torch-onnx/venv/bin/optimum-cli export onnx --model openai/whisper-large-v3 whisper/ --no-post-process

11.784 export_pytorch  optimum/exporters/onnx/convert.py:485
└─ 11.784 export  optimum/exporters/onnx/convert.py:584
   └─ 11.783 _torch_onnx_export  torch_onnx/_patch.py:102
      └─ 11.704 export  torch_onnx/_core.py:793
         ├─ 7.431 export  torch/export/__init__.py:73
         │     [253 frames hidden]  torch, contextlib, copy, dis, importl...
         └─ 4.272 exported_program_to_ir  torch_onnx/_core.py:618
            ├─ 3.060 wrapper  torch/export/exported_program.py:80
            │     [78 frames hidden]  torch, <string>
            ├─ 0.604 _add_nodes  torch_onnx/_core.py:486
            │  └─ 0.594 _handle_call_function_node_with_lowering  torch_onnx/_core.py:356
            │     └─ 0.401 TracedOnnxFunction.__call__  ../../onnxscript/onnxscript/values.py:581
            │        └─ 0.239 SymbolicTensor.aten_view  ../../onnxscript/onnxscript/function_libs/torch_lib/ops/core.py:8740
            │           └─ 0.144 Opset18.Cast  ../../onnxscript/onnxscript/onnx_opset/_impl/opset13.py:241
            │              └─ 0.141 Op.__call__  ../../onnxscript/onnxscript/values.py:291
            │                 └─ 0.140 OpRecorder.eval  torch_onnx/_building.py:390
            ├─ 0.308 OnnxRegistry.from_torchlib  torch_onnx/_registration.py:114
            │  └─ 0.145 _get_overload  torch_onnx/_registration.py:57
            │     └─ 0.140 <module>  torchvision/__init__.py:1
            └─ 0.279 insert_type_promotion_nodes  torch_onnx/_fx_passes.py:13
               └─ 0.257 wrapper  torch/onnx/_internal/diagnostics/infra/decorator.py:71
                     [13 frames hidden]  torch

  _     ._   __/__   _ _  _  _ _/_   Recorded: 07:19:57  Samples:  17575
 /_//_/// /_\ / //_// / //_'/ //     Duration: 18.621    CPU time: 21.274
/   _/                      v4.6.2

Program: /Users/justinc/Documents/GitHub/torch-onnx/venv/bin/optimum-cli export onnx --model openai/whisper-large-v3 whisper/ --no-post-process

18.621 export_pytorch  optimum/exporters/onnx/convert.py:485
└─ 18.621 export  optimum/exporters/onnx/convert.py:584
   └─ 18.621 _torch_onnx_export  torch_onnx/_patch.py:102
      ├─ 18.432 export  torch_onnx/_core.py:793
      │  ├─ 11.837 export  torch/export/__init__.py:73
      │  │     [272 frames hidden]  torch, contextlib, copy, dis, optimum...
      │  └─ 6.593 exported_program_to_ir  torch_onnx/_core.py:618
      │     ├─ 4.588 wrapper  torch/export/exported_program.py:80
      │     │     [76 frames hidden]  torch, <string>
      │     ├─ 1.147 _add_nodes  torch_onnx/_core.py:486
      │     │  └─ 1.129 _handle_call_function_node_with_lowering  torch_onnx/_core.py:356
      │     │     └─ 0.747 TracedOnnxFunction.__call__  ../../onnxscript/onnxscript/values.py:581
      │     │        ├─ 0.472 SymbolicTensor.aten_view  ../../onnxscript/onnxscript/function_libs/torch_lib/ops/core.py:8740
      │     │        │  ├─ 0.267 Opset18.Cast  ../../onnxscript/onnxscript/onnx_opset/_impl/opset13.py:241
      │     │        │  │  └─ 0.260 Op.__call__  ../../onnxscript/onnxscript/values.py:291
      │     │        │  │     └─ 0.254 OpRecorder.eval  torch_onnx/_building.py:390
      │     │        │  └─ 0.194 Opset18.Reshape  ../../onnxscript/onnxscript/onnx_opset/_impl/opset14.py:876
      │     │        │     └─ 0.189 Op.__call__  ../../onnxscript/onnxscript/values.py:291
      │     │        └─ 0.192 SymbolicTensor.aten_clone  ../../onnxscript/onnxscript/function_libs/torch_lib/ops/core.py:1687
      │     │           └─ 0.189 Opset18.Identity  ../../onnxscript/onnxscript/onnx_opset/_impl/opset16.py:240
      │     └─ 0.674 insert_type_promotion_nodes  torch_onnx/_fx_passes.py:13
      │        └─ 0.636 wrapper  torch/onnx/_internal/diagnostics/infra/decorator.py:71
      │              [16 frames hidden]  torch
      └─ 0.188 ONNXProgram.save  torch_onnx/_onnx_program.py:25

  _     ._   __/__   _ _  _  _ _/_   Recorded: 07:20:18  Samples:  16959
 /_//_/// /_\ / //_// / //_'/ //     Duration: 17.888    CPU time: 20.886
/   _/                      v4.6.2

Program: /Users/justinc/Documents/GitHub/torch-onnx/venv/bin/optimum-cli export onnx --model openai/whisper-large-v3 whisper/ --no-post-process

17.887 export_pytorch  optimum/exporters/onnx/convert.py:485
└─ 17.887 export  optimum/exporters/onnx/convert.py:584
   └─ 17.887 _torch_onnx_export  torch_onnx/_patch.py:102
      └─ 17.759 export  torch_onnx/_core.py:793
         ├─ 11.605 export  torch/export/__init__.py:73
         │     [310 frames hidden]  torch, contextlib, copy, dis, ast, op...
         └─ 6.150 exported_program_to_ir  torch_onnx/_core.py:618
            ├─ 4.584 wrapper  torch/export/exported_program.py:80
            │     [79 frames hidden]  torch, <string>
            ├─ 0.912 _add_nodes  torch_onnx/_core.py:486
            │  └─ 0.894 _handle_call_function_node_with_lowering  torch_onnx/_core.py:356
            │     └─ 0.542 TracedOnnxFunction.__call__  ../../onnxscript/onnxscript/values.py:581
            │        └─ 0.369 SymbolicTensor.aten_view  ../../onnxscript/onnxscript/function_libs/torch_lib/ops/core.py:8740
            │           └─ 0.234 Opset18.Cast  ../../onnxscript/onnxscript/onnx_opset/_impl/opset13.py:241
            │              └─ 0.230 Op.__call__  ../../onnxscript/onnxscript/values.py:291
            │                 └─ 0.227 OpRecorder.eval  torch_onnx/_building.py:390
            └─ 0.469 insert_type_promotion_nodes  torch_onnx/_fx_passes.py:13
               └─ 0.436 wrapper  torch/onnx/_internal/diagnostics/infra/decorator.py:71
                     [13 frames hidden]  torch

Memory profiling

mprof run optimum-cli export onnx --model openai/whisper-large-
v3 whisper/ --no-post-process
mprof: Sampling memory every 0.1s
running new process
Framework not specified. Using pt to export the model.
/Users/justinc/Documents/GitHub/torch-onnx/venv/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Automatic task detection to automatic-speech-recognition-with-past (possible synonyms are: speech2seq-lm-with-past).
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Using the export variant default. Available variants are:
    - default: The default ONNX variant.
Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41.
Non-default generation parameters: {'max_length': 448, 'begin_suppress_tokens': [220, 50257]}
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

***** Exporting submodel 1/3: WhisperEncoder *****
Using framework PyTorch: 2.3.1
Overriding 1 configuration item(s)
        - use_cache -> False
Obtain model graph for `WhisperEncoder([...]` with `torch.export.export`...
Obtain model graph for `WhisperEncoder([...]` with `torch.export.export`... ✅
Translate the graph into ONNX...
aten::getitem is not found in this version of PyTorch.
/Users/justinc/Documents/GitHub/torch-onnx/src/torch_onnx/_registration.py:134: UserWarning: aten::getitem does not have a default overload or is not found. Ignoring.
  warnings.warn(
Translate the graph into ONNX... ✅
The initializers have been removed from the model. This is destructive. Developers: Please implement ir.Model copy() and remove initializers on the copied model.
Filename: /Users/justinc/Documents/GitHub/torch-onnx/venv/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   583   6302.5 MiB   6302.5 MiB           1               @memory_profiler.profile
   584                                                     def export():
   585                                                         # export_options = torch.onnx.ExportOptions(dynamic_shapes=False)
   586                                                         # onnx_program = torch.onnx.dynamo_export(
   587                                                         #     model,
   588                                                         #     export_options = export_options,
   589                                                         #     **dummy_inputs,
   590                                                         # )
   591                                                         # onnx_program.save(output.as_posix())
   592   6430.7 MiB    128.2 MiB           2                   onnx_export(
   593   6302.5 MiB      0.0 MiB           1                       model,
   594   6302.5 MiB      0.0 MiB           1                       (dummy_inputs,),
   595   6302.5 MiB      0.0 MiB           1                       f=output.as_posix(),
   596   6302.5 MiB      0.0 MiB           1                       input_names=input_names,
   597   6302.5 MiB      0.0 MiB           1                       output_names=output_names,
   598                                                             # dynamic_axes=dynamix_axes,
   599   6302.5 MiB      0.0 MiB           1                       do_constant_folding=do_constant_folding,
   600   6302.5 MiB      0.0 MiB           1                       opset_version=opset,
   601   6302.5 MiB      0.0 MiB           1                       export_params=False, # MARK
   602                                                         )

***** Exporting submodel 2/3: WhisperForConditionalGeneration *****
Using framework PyTorch: 2.3.1
Overriding 1 configuration item(s)
        - use_cache -> True
Obtain model graph for `WhisperForConditionalGeneration([...]` with `torch.export.export`...
Obtain model graph for `WhisperForConditionalGeneration([...]` with `torch.export.export`... ✅
Translate the graph into ONNX...
aten::getitem is not found in this version of PyTorch.
/Users/justinc/Documents/GitHub/torch-onnx/src/torch_onnx/_registration.py:134: UserWarning: aten::getitem does not have a default overload or is not found. Ignoring.
  warnings.warn(
Translate the graph into ONNX... ✅
The initializers have been removed from the model. This is destructive. Developers: Please implement ir.Model copy() and remove initializers on the copied model.
Filename: /Users/justinc/Documents/GitHub/torch-onnx/venv/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   583   6430.8 MiB   6430.8 MiB           1               @memory_profiler.profile
   584                                                     def export():
   585                                                         # export_options = torch.onnx.ExportOptions(dynamic_shapes=False)
   586                                                         # onnx_program = torch.onnx.dynamo_export(
   587                                                         #     model,
   588                                                         #     export_options = export_options,
   589                                                         #     **dummy_inputs,
   590                                                         # )
   591                                                         # onnx_program.save(output.as_posix())
   592   6535.2 MiB    104.4 MiB           2                   onnx_export(
   593   6430.8 MiB      0.0 MiB           1                       model,
   594   6430.8 MiB      0.0 MiB           1                       (dummy_inputs,),
   595   6430.8 MiB      0.0 MiB           1                       f=output.as_posix(),
   596   6430.8 MiB      0.0 MiB           1                       input_names=input_names,
   597   6430.8 MiB      0.0 MiB           1                       output_names=output_names,
   598                                                             # dynamic_axes=dynamix_axes,
   599   6430.8 MiB      0.0 MiB           1                       do_constant_folding=do_constant_folding,
   600   6430.8 MiB      0.0 MiB           1                       opset_version=opset,
   601   6430.8 MiB      0.0 MiB           1                       export_params=False, # MARK
   602                                                         )

***** Exporting submodel 3/3: WhisperForConditionalGeneration *****
Using framework PyTorch: 2.3.1
Overriding 1 configuration item(s)
        - use_cache -> True
Obtain model graph for `WhisperForConditionalGeneration([...]` with `torch.export.export`...
Obtain model graph for `WhisperForConditionalGeneration([...]` with `torch.export.export`... ✅
Translate the graph into ONNX...
aten::getitem is not found in this version of PyTorch.
/Users/justinc/Documents/GitHub/torch-onnx/src/torch_onnx/_registration.py:134: UserWarning: aten::getitem does not have a default overload or is not found. Ignoring.
  warnings.warn(
Translate the graph into ONNX... ✅
The initializers have been removed from the model. This is destructive. Developers: Please implement ir.Model copy() and remove initializers on the copied model.
Filename: /Users/justinc/Documents/GitHub/torch-onnx/venv/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   583   6547.8 MiB   6547.8 MiB           1               @memory_profiler.profile
   584                                                     def export():
   585                                                         # export_options = torch.onnx.ExportOptions(dynamic_shapes=False)
   586                                                         # onnx_program = torch.onnx.dynamo_export(
   587                                                         #     model,
   588                                                         #     export_options = export_options,
   589                                                         #     **dummy_inputs,
   590                                                         # )
   591                                                         # onnx_program.save(output.as_posix())
   592   6585.6 MiB     37.7 MiB           2                   onnx_export(
   593   6547.8 MiB      0.0 MiB           1                       model,
   594   6547.8 MiB      0.0 MiB           1                       (dummy_inputs,),
   595   6547.8 MiB      0.0 MiB           1                       f=output.as_posix(),
   596   6547.8 MiB      0.0 MiB           1                       input_names=input_names,
   597   6547.8 MiB      0.0 MiB           1                       output_names=output_names,
   598                                                             # dynamic_axes=dynamix_axes,
   599   6547.8 MiB      0.0 MiB           1                       do_constant_folding=do_constant_folding,
   600   6547.8 MiB      0.0 MiB           1                       opset_version=opset,
   601   6547.8 MiB      0.0 MiB           1                       export_params=False, # MARK
   602                                                         )

The ONNX export succeeded and the exported model was saved at: whisper

mprofile_20240625071529.txt

justinchuby commented 5 months ago

torch.onnx.export

Profiling

  _     ._   __/__   _ _  _  _ _/_   Recorded: 07:22:04  Samples:  2717
 /_//_/// /_\ / //_// / //_'/ //     Duration: 17.402    CPU time: 71.175
/   _/                      v4.6.2

Program: /Users/justinc/Documents/GitHub/torch-onnx/venv/bin/optimum-cli export onnx --model openai/whisper-large-v3 whisper_onnx_export/ --no-post-process

17.402 export_pytorch  optimum/exporters/onnx/convert.py:485
└─ 17.402 export  optimum/exporters/onnx/convert.py:584
      [50 frames hidden]  optimum, torch, transformers, <built-in>
         4.404 PyCapsule._jit_pass_onnx_graph_shape_type_inference  <built-in>
         3.594 PyCapsule._jit_pass_onnx_graph_shape_type_inference  <built-in>

  _     ._   __/__   _ _  _  _ _/_   Recorded: 07:22:22  Samples:  7318
 /_//_/// /_\ / //_// / //_'/ //     Duration: 31.116    CPU time: 43.408
/   _/                      v4.6.2

Program: /Users/justinc/Documents/GitHub/torch-onnx/venv/bin/optimum-cli export onnx --model openai/whisper-large-v3 whisper_onnx_export/ --no-post-process

31.116 export_pytorch  optimum/exporters/onnx/convert.py:485
└─ 31.116 export  optimum/exporters/onnx/convert.py:584
      [51 frames hidden]  optimum, torch, <built-in>, transformers
         23.168 _optimize_graph  torch/onnx/utils.py:574
         ├─ 13.022 PyCapsule._jit_pass_onnx_graph_shape_type_inference  <built-in>
         ├─ 7.271 [self]  torch/onnx/utils.py

  _     ._   __/__   _ _  _  _ _/_   Recorded: 07:22:53  Samples:  6457
 /_//_/// /_\ / //_// / //_'/ //     Duration: 26.578    CPU time: 36.811
/   _/                      v4.6.2

Program: /Users/justinc/Documents/GitHub/torch-onnx/venv/bin/optimum-cli export onnx --model openai/whisper-large-v3 whisper_onnx_export/ --no-post-process

26.578 export_pytorch  optimum/exporters/onnx/convert.py:485
└─ 26.578 export  optimum/exporters/onnx/convert.py:584
      [51 frames hidden]  optimum, torch, <built-in>, transformers
         19.628 _optimize_graph  torch/onnx/utils.py:574
         ├─ 10.655 PyCapsule._jit_pass_onnx_graph_shape_type_inference  <built-in>
         ├─ 6.286 [self]  torch/onnx/utils.py
         5.489 PyCapsule._jit_pass_onnx_graph_shape_type_inference  <built-in>

Memory profiling

mprof run optimum-cli export onnx --model openai/whisper-large-
v3 whisper_onnx_export/ --no-post-process
mprof: Sampling memory every 0.1s
running new process
Framework not specified. Using pt to export the model.
/Users/justinc/Documents/GitHub/torch-onnx/venv/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Automatic task detection to automatic-speech-recognition-with-past (possible synonyms are: speech2seq-lm-with-past).
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Using the export variant default. Available variants are:
    - default: The default ONNX variant.
Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41.
Non-default generation parameters: {'max_length': 448, 'begin_suppress_tokens': [220, 50257]}
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

***** Exporting submodel 1/3: WhisperEncoder *****
Using framework PyTorch: 2.3.1
Overriding 1 configuration item(s)
        - use_cache -> False
/Users/justinc/Documents/GitHub/torch-onnx/venv/lib/python3.11/site-packages/transformers/models/whisper/modeling_whisper.py:1159: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if input_features.shape[-1] != expected_seq_length:
/Users/justinc/Documents/GitHub/torch-onnx/venv/lib/python3.11/site-packages/transformers/models/whisper/modeling_whisper.py:338: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
/Users/justinc/Documents/GitHub/torch-onnx/venv/lib/python3.11/site-packages/transformers/models/whisper/modeling_whisper.py:377: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
Filename: /Users/justinc/Documents/GitHub/torch-onnx/venv/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   583   6304.2 MiB   6304.2 MiB           1               @memory_profiler.profile
   584                                                     def export():
   585                                                         # export_options = torch.onnx.ExportOptions(dynamic_shapes=False)
   586                                                         # onnx_program = torch.onnx.dynamo_export(
   587                                                         #     model,
   588                                                         #     export_options = export_options,
   589                                                         #     **dummy_inputs,
   590                                                         # )
   591                                                         # onnx_program.save(output.as_posix())
   592   7520.5 MiB   1216.2 MiB           2                   onnx_export(
   593   6304.2 MiB      0.0 MiB           1                       model,
   594   6304.2 MiB      0.0 MiB           1                       (dummy_inputs,),
   595   6304.2 MiB      0.0 MiB           1                       f=output.as_posix(),
   596   6304.2 MiB      0.0 MiB           1                       input_names=input_names,
   597   6304.2 MiB      0.0 MiB           1                       output_names=output_names,
   598                                                             # dynamic_axes=dynamix_axes,
   599   6304.2 MiB      0.0 MiB           1                       do_constant_folding=do_constant_folding,
   600   6304.2 MiB      0.0 MiB           1                       opset_version=opset,
   601   6304.2 MiB      0.0 MiB           1                       export_params=False, # MARK
   602                                                         )

***** Exporting submodel 2/3: WhisperForConditionalGeneration *****
Using framework PyTorch: 2.3.1
Overriding 1 configuration item(s)
        - use_cache -> True
/Users/justinc/Documents/GitHub/torch-onnx/venv/lib/python3.11/site-packages/transformers/modeling_attn_mask_utils.py:86: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if input_shape[-1] > 1 or self.sliding_window is not None:
/Users/justinc/Documents/GitHub/torch-onnx/venv/lib/python3.11/site-packages/transformers/modeling_attn_mask_utils.py:162: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if past_key_values_length > 0:
/Users/justinc/Documents/GitHub/torch-onnx/venv/lib/python3.11/site-packages/transformers/models/whisper/modeling_whisper.py:345: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attention_mask.size() != (bsz, 1, tgt_len, src_len):
Filename: /Users/justinc/Documents/GitHub/torch-onnx/venv/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   583   7520.6 MiB   7520.6 MiB           1               @memory_profiler.profile
   584                                                     def export():
   585                                                         # export_options = torch.onnx.ExportOptions(dynamic_shapes=False)
   586                                                         # onnx_program = torch.onnx.dynamo_export(
   587                                                         #     model,
   588                                                         #     export_options = export_options,
   589                                                         #     **dummy_inputs,
   590                                                         # )
   591                                                         # onnx_program.save(output.as_posix())
   592   7840.1 MiB    319.5 MiB           2                   onnx_export(
   593   7520.6 MiB      0.0 MiB           1                       model,
   594   7520.6 MiB      0.0 MiB           1                       (dummy_inputs,),
   595   7520.6 MiB      0.0 MiB           1                       f=output.as_posix(),
   596   7520.6 MiB      0.0 MiB           1                       input_names=input_names,
   597   7520.6 MiB      0.0 MiB           1                       output_names=output_names,
   598                                                             # dynamic_axes=dynamix_axes,
   599   7520.6 MiB      0.0 MiB           1                       do_constant_folding=do_constant_folding,
   600   7520.6 MiB      0.0 MiB           1                       opset_version=opset,
   601   7520.6 MiB      0.0 MiB           1                       export_params=False, # MARK
   602                                                         )

***** Exporting submodel 3/3: WhisperForConditionalGeneration *****
Using framework PyTorch: 2.3.1
Overriding 1 configuration item(s)
        - use_cache -> True
/Users/justinc/Documents/GitHub/torch-onnx/venv/lib/python3.11/site-packages/transformers/models/whisper/modeling_whisper.py:300: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  and past_key_value[0].shape[2] == key_value_states.shape[1]
Filename: /Users/justinc/Documents/GitHub/torch-onnx/venv/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   583   7840.1 MiB   7840.1 MiB           1               @memory_profiler.profile
   584                                                     def export():
   585                                                         # export_options = torch.onnx.ExportOptions(dynamic_shapes=False)
   586                                                         # onnx_program = torch.onnx.dynamo_export(
   587                                                         #     model,
   588                                                         #     export_options = export_options,
   589                                                         #     **dummy_inputs,
   590                                                         # )
   591                                                         # onnx_program.save(output.as_posix())
   592   7863.9 MiB     23.8 MiB           2                   onnx_export(
   593   7840.1 MiB      0.0 MiB           1                       model,
   594   7840.1 MiB      0.0 MiB           1                       (dummy_inputs,),
   595   7840.1 MiB      0.0 MiB           1                       f=output.as_posix(),
   596   7840.1 MiB      0.0 MiB           1                       input_names=input_names,
   597   7840.1 MiB      0.0 MiB           1                       output_names=output_names,
   598                                                             # dynamic_axes=dynamix_axes,
   599   7840.1 MiB      0.0 MiB           1                       do_constant_folding=do_constant_folding,
   600   7840.1 MiB      0.0 MiB           1                       opset_version=opset,
   601   7840.1 MiB      0.0 MiB           1                       export_params=False, # MARK
   602                                                         )

The ONNX export succeeded and the exported model was saved at: whisper_onnx_export

mprofile_20240625072447.txt

justinchuby commented 5 months ago

torch_onnx dynamo improved w/ fake tensors

Memory profiling

mprof run optimum-cli export onnx --model openai/whisper-large-
v3 whisper_fake/ --no-post-process
mprof: Sampling memory every 0.1s
running new process
Framework not specified. Using pt to export the model.
/Users/justinc/Documents/GitHub/torch-onnx/venv/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Automatic task detection to automatic-speech-recognition-with-past (possible synonyms are: speech2seq-lm-with-past).
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Using the export variant default. Available variants are:
    - default: The default ONNX variant.
Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41.
Non-default generation parameters: {'max_length': 448, 'begin_suppress_tokens': [220, 50257]}
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

***** Exporting submodel 1/3: WhisperEncoder *****
Using framework PyTorch: 2.3.1
Overriding 1 configuration item(s)
        - use_cache -> False
Obtain model graph for `WhisperEncoder([...]` with `torch.export.export`...
Obtain model graph for `WhisperEncoder([...]` with `torch.export.export`... ✅
Translate the graph into ONNX...
aten::getitem is not found in this version of PyTorch.
/Users/justinc/Documents/GitHub/torch-onnx/src/torch_onnx/_registration.py:134: UserWarning: aten::getitem does not have a default overload or is not found. Ignoring.
  warnings.warn(
Translate the graph into ONNX... ✅
The initializers have been removed from the model. This is destructive. Developers: Please implement ir.Model copy() and remove initializers on the copied model.
Filename: /Users/justinc/Documents/GitHub/torch-onnx/venv/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   583    429.2 MiB    429.2 MiB           1               @memory_profiler.profile
   584                                                     def export():
   585                                                         # export_options = torch.onnx.ExportOptions(dynamic_shapes=False)
   586                                                         # onnx_program = torch.onnx.dynamo_export(
   587                                                         #     model,
   588                                                         #     export_options = export_options,
   589                                                         #     **dummy_inputs,
   590                                                         # )
   591                                                         # onnx_program.save(output.as_posix())
   592    541.5 MiB    112.4 MiB           2                   onnx_export(
   593    429.2 MiB      0.0 MiB           1                       model,
   594    429.2 MiB      0.0 MiB           1                       (dummy_inputs,),
   595    429.2 MiB      0.0 MiB           1                       f=output.as_posix(),
   596    429.2 MiB      0.0 MiB           1                       input_names=input_names,
   597    429.2 MiB      0.0 MiB           1                       output_names=output_names,
   598                                                             # dynamic_axes=dynamix_axes,
   599    429.2 MiB      0.0 MiB           1                       do_constant_folding=do_constant_folding,
   600    429.2 MiB      0.0 MiB           1                       opset_version=opset,
   601    429.2 MiB      0.0 MiB           1                       export_params=False, # MARK
   602                                                         )

***** Exporting submodel 2/3: WhisperForConditionalGeneration *****
Using framework PyTorch: 2.3.1
Overriding 1 configuration item(s)
        - use_cache -> True
Obtain model graph for `WhisperForConditionalGeneration([...]` with `torch.export.export`...
Obtain model graph for `WhisperForConditionalGeneration([...]` with `torch.export.export`... ✅
Translate the graph into ONNX...
aten::getitem is not found in this version of PyTorch.
/Users/justinc/Documents/GitHub/torch-onnx/src/torch_onnx/_registration.py:134: UserWarning: aten::getitem does not have a default overload or is not found. Ignoring.
  warnings.warn(
Translate the graph into ONNX... ✅
The initializers have been removed from the model. This is destructive. Developers: Please implement ir.Model copy() and remove initializers on the copied model.
Filename: /Users/justinc/Documents/GitHub/torch-onnx/venv/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   583    541.6 MiB    541.6 MiB           1               @memory_profiler.profile
   584                                                     def export():
   585                                                         # export_options = torch.onnx.ExportOptions(dynamic_shapes=False)
   586                                                         # onnx_program = torch.onnx.dynamo_export(
   587                                                         #     model,
   588                                                         #     export_options = export_options,
   589                                                         #     **dummy_inputs,
   590                                                         # )
   591                                                         # onnx_program.save(output.as_posix())
   592    635.7 MiB     94.1 MiB           2                   onnx_export(
   593    541.6 MiB      0.0 MiB           1                       model,
   594    541.6 MiB      0.0 MiB           1                       (dummy_inputs,),
   595    541.6 MiB      0.0 MiB           1                       f=output.as_posix(),
   596    541.6 MiB      0.0 MiB           1                       input_names=input_names,
   597    541.6 MiB      0.0 MiB           1                       output_names=output_names,
   598                                                             # dynamic_axes=dynamix_axes,
   599    541.6 MiB      0.0 MiB           1                       do_constant_folding=do_constant_folding,
   600    541.6 MiB      0.0 MiB           1                       opset_version=opset,
   601    541.6 MiB      0.0 MiB           1                       export_params=False, # MARK
   602                                                         )

***** Exporting submodel 3/3: WhisperForConditionalGeneration *****
Using framework PyTorch: 2.3.1
Overriding 1 configuration item(s)
        - use_cache -> True
Obtain model graph for `WhisperForConditionalGeneration([...]` with `torch.export.export`...
Obtain model graph for `WhisperForConditionalGeneration([...]` with `torch.export.export`... ✅
Translate the graph into ONNX...
aten::getitem is not found in this version of PyTorch.
/Users/justinc/Documents/GitHub/torch-onnx/src/torch_onnx/_registration.py:134: UserWarning: aten::getitem does not have a default overload or is not found. Ignoring.
  warnings.warn(
Translate the graph into ONNX... ✅
The initializers have been removed from the model. This is destructive. Developers: Please implement ir.Model copy() and remove initializers on the copied model.
Filename: /Users/justinc/Documents/GitHub/torch-onnx/venv/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   583    635.7 MiB    635.7 MiB           1               @memory_profiler.profile
   584                                                     def export():
   585                                                         # export_options = torch.onnx.ExportOptions(dynamic_shapes=False)
   586                                                         # onnx_program = torch.onnx.dynamo_export(
   587                                                         #     model,
   588                                                         #     export_options = export_options,
   589                                                         #     **dummy_inputs,
   590                                                         # )
   591                                                         # onnx_program.save(output.as_posix())
   592    670.1 MiB     34.4 MiB           2                   onnx_export(
   593    635.7 MiB      0.0 MiB           1                       model,
   594    635.7 MiB      0.0 MiB           1                       (dummy_inputs,),
   595    635.7 MiB      0.0 MiB           1                       f=output.as_posix(),
   596    635.7 MiB      0.0 MiB           1                       input_names=input_names,
   597    635.7 MiB      0.0 MiB           1                       output_names=output_names,
   598                                                             # dynamic_axes=dynamix_axes,
   599    635.7 MiB      0.0 MiB           1                       do_constant_folding=do_constant_folding,
   600    635.7 MiB      0.0 MiB           1                       opset_version=opset,
   601    635.7 MiB      0.0 MiB           1                       export_params=False, # MARK
   602                                                         )

The ONNX export succeeded and the exported model was saved at: whisper_fake

mprofile_20240625074550.txt

justinchuby / torch-onnx

Optimum profiling 2 #71

torch.onnx.dynamo_export

Profiling

Memory profiling

torch_onnx dynamo improved w/ external tensor handling (theoretical)

Profiling

Memory profiling

torch.onnx.export

Profiling

Memory profiling

torch_onnx dynamo improved w/ fake tensors

Memory profiling