neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs
https://neuralmagic.com/deepsparse/
Other
2.99k stars 173 forks source link

Extractor dfs performance #1655

Open kylesayrs opened 3 months ago

kylesayrs commented 3 months ago

Description

This change modifies the DFS search used in model extraction to use sets rather than iterables

Motivation and Context

While shape inference is the most significant bottleneck, these changes are a step in the direction of being able to support model extraction for very large graphs.

Test Script

import onnx
from onnx.utils import Extractor

model = onnx.load("obertquant.onnx")
extractor = Extractor(model)
extracted_model = extractor.extract_model(
    input_names=["input_ids", "attention_mask", "token_type_ids"], output_names=["2058"]
)
onnx.save(extracted_model, "truncated.onnx")
Benchmarks were produced using pyinstrument and analyzing the Extractor.extract_model function Model Name Num Nodes Previous New
obertquant.onnx 1271 0.158s 0.110s
ai-town-3B.onnx 3515 8.002s 3.725s
kylesayrs commented 2 months ago

Related: https://github.com/onnx/onnx/pull/6213