justinchuby / torch-onnx

Prototype of the next torch exporter
MIT License
1 stars 0 forks source link

Phi3 profiling #74

Open justinchuby opened 1 week ago

justinchuby commented 1 week ago

Using Fake Tensors

image
justinchuby commented 1 week ago

model model.onnx.zip

justinchuby commented 1 week ago

PyTorch ONNX Conversion Report

✅ Obtain model graph with `torch.export.export`
✅ Translate the graph into ONNX
⚪ Run `onnx.checker` on the ONNX model
⚪ Execute the model with ONNX Runtime
⚪ Validate model output accuracy

Profiling result


  _     ._   __/__   _ _  _  _ _/_   Recorded: 10:53:38  Samples:  17242
 /_//_/// /_\ / //_// / //_'/ //     Duration: 18.227    CPU time: 18.075
/   _/                      v4.6.2

Program: /Users/justinc/Documents/GitHub/torch-onnx/venv/bin/optimum-cli export onnx --model microsoft/Phi-3-mini-4k-instruct phi3

18.227 export  torch_onnx/_core.py:796
├─ 11.996 export  torch/export/__init__.py:73
│     [273 frames hidden]  torch, contextlib, dis, importlib, as...
└─ 6.228 exported_program_to_ir  torch_onnx/_core.py:618
   ├─ 3.694 wrapper  torch/export/exported_program.py:80
   │     [60 frames hidden]  torch, <string>
   ├─ 1.758 _add_nodes  torch_onnx/_core.py:486
   │  └─ 1.746 _handle_call_function_node_with_lowering  torch_onnx/_core.py:356
   │     └─ 1.179 TracedOnnxFunction.__call__  ../../onnxscript/onnxscript/values.py:581
   │        ├─ 0.612 SymbolicTensor.aten_slice  ../../onnxscript/onnxscript/function_libs/torch_lib/ops/core.py:7524
   │        │  ├─ 0.224 Opset18.Constant  ../../onnxscript/onnxscript/onnx_opset/_impl/opset13.py:408
   │        │  │  └─ 0.219 Op.__call__  ../../onnxscript/onnxscript/values.py:291
   │        │  │     └─ 0.216 OpRecorder.eval  torch_onnx/_building.py:390
   │        │  └─ 0.202 Opset18.Cast  ../../onnxscript/onnxscript/onnx_opset/_impl/opset13.py:241
   │        │     └─ 0.196 Op.__call__  ../../onnxscript/onnxscript/values.py:291
   │        │        └─ 0.192 OpRecorder.eval  torch_onnx/_building.py:390
   │        ├─ 0.258 SymbolicTensor.aten_view  ../../onnxscript/onnxscript/function_libs/torch_lib/ops/core.py:8740
   │        └─ 0.226 SymbolicTensor.aten_clone  ../../onnxscript/onnxscript/function_libs/torch_lib/ops/core.py:1687
   │           └─ 0.226 Opset18.Identity  ../../onnxscript/onnxscript/onnx_opset/_impl/opset16.py:240
   │              └─ 0.222 Op.__call__  ../../onnxscript/onnxscript/values.py:291
   │                 └─ 0.220 OpRecorder.eval  torch_onnx/_building.py:390
   │                    └─ 0.200 OpSignature.from_opschema  torch_onnx/_schemas.py:380
   │                       └─ 0.197 <dictcomp>  torch_onnx/_schemas.py:383
   │                          └─ 0.197 <setcomp>  torch_onnx/_schemas.py:386
   ├─ 0.456 insert_type_promotion_nodes  torch_onnx/_fx_passes.py:13
   │  └─ 0.419 wrapper  torch/onnx/_internal/diagnostics/infra/decorator.py:71
   │        [9 frames hidden]  torch
   └─ 0.302 OnnxRegistry.from_torchlib  torch_onnx/_registration.py:114
justinchuby commented 1 week ago

Analysis

PyTorch ONNX Conversion Analysis

Model Information

The model has 3821079552 parameters and 1536 buffers (non-trainable parameters). Number of parameters per dtype:

defaultdict(<class 'int'>, {torch.float32: 3821079552})

Number of buffers per dtype:

defaultdict(<class 'int'>, {torch.float32: 1536})

Inputs:

Outputs:

The FX graph has 3697 nodes in total. Number of FX nodes per op:

Of the call_function nodes, the counts of operators used are:

ONNX Conversion Information

All operators in the model have registered ONNX decompositions.

Profiling result


  _     ._   __/__   _ _  _  _ _/_   Recorded: 10:56:07  Samples:  17276
 /_//_/// /_\ / //_// / //_'/ //     Duration: 18.271    CPU time: 18.110
/   _/                      v4.6.2

Program: /Users/justinc/Documents/GitHub/torch-onnx/venv/bin/optimum-cli export onnx --model microsoft/Phi-3-mini-4k-instruct phi3

18.270 export  torch_onnx/_core.py:796
├─ 12.038 export  torch/export/__init__.py:73
│     [277 frames hidden]  torch, contextlib, dis, importlib, as...
└─ 6.231 exported_program_to_ir  torch_onnx/_core.py:618
   ├─ 3.705 wrapper  torch/export/exported_program.py:80
   │     [64 frames hidden]  torch, <string>
   ├─ 1.755 _add_nodes  torch_onnx/_core.py:486
   │  └─ 1.742 _handle_call_function_node_with_lowering  torch_onnx/_core.py:356
   │     ├─ 1.191 TracedOnnxFunction.__call__  ../../onnxscript/onnxscript/values.py:581
   │     │  ├─ 0.625 SymbolicTensor.aten_slice  ../../onnxscript/onnxscript/function_libs/torch_lib/ops/core.py:7524
   │     │  │  ├─ 0.230 Opset18.Cast  ../../onnxscript/onnxscript/onnx_opset/_impl/opset13.py:241
   │     │  │  │  └─ 0.224 Op.__call__  ../../onnxscript/onnxscript/values.py:291
   │     │  │  │     └─ 0.220 OpRecorder.eval  torch_onnx/_building.py:390
   │     │  │  └─ 0.215 Opset18.Constant  ../../onnxscript/onnxscript/onnx_opset/_impl/opset13.py:408
   │     │  │     └─ 0.212 Op.__call__  ../../onnxscript/onnxscript/values.py:291
   │     │  │        └─ 0.211 OpRecorder.eval  torch_onnx/_building.py:390
   │     │  └─ 0.258 SymbolicTensor.aten_view  ../../onnxscript/onnxscript/function_libs/torch_lib/ops/core.py:8740
   │     └─ 0.198 _set_node_metadata  torch_onnx/_core.py:226
   ├─ 0.454 insert_type_promotion_nodes  torch_onnx/_fx_passes.py:13
   │  └─ 0.419 wrapper  torch/onnx/_internal/diagnostics/infra/decorator.py:71
   │        [9 frames hidden]  torch
   └─ 0.299 OnnxRegistry.from_torchlib  torch_onnx/_registration.py:114