huggingface / optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
https://huggingface.co/docs/optimum/main/
Apache License 2.0
2.56k stars 464 forks source link

GPU Graph optimization for Flan-T5-Large #845

Open Matthieu-Tinycoaching opened 1 year ago

Matthieu-Tinycoaching commented 1 year ago

Feature request

Would it be possible to add GPU graph optimizations for Flan-T5-Large model? (request also at https://github.com/microsoft/onnxruntime/issues/14886)

Actually, after having exported the model to ONNX and trying to optimize it with ORTOptimizer as below:

from optimum.onnxruntime import ORTOptimizer
from optimum.onnxruntime.configuration import OptimizationConfig

onnx_path = Path("./flan-t5-large")

# Create ORTOptimizer
optimizer = ORTOptimizer.from_pretrained(ort_model)

# Define the optimization strategy by creating the appropriate configuration
optimization_config = OptimizationConfig(optimization_level=1,
                                        optimize_for_gpu=True,
                                        fp16=True
                                        )

# Optimize the model
optimizer.optimize(save_dir=onnx_path, optimization_config=optimization_config)

I got the following error message:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In [10], line 16
     10 optimization_config = OptimizationConfig(optimization_level=1,
     11                                         optimize_for_gpu=True,
     12                                         fp16=True
     13                                         )
     15 # Optimize the model
---> 16 optimizer.optimize(save_dir=onnx_path, optimization_config=optimization_config)

File ~/anaconda3/envs/optimum_gpu_py3.8/lib/python3.8/site-packages/optimum/onnxruntime/optimization.py:128, in ORTOptimizer.optimize(self, optimization_config, save_dir, file_suffix, use_external_data_format, one_external_file)
    126 save_dir = Path(save_dir)
    127 save_dir.mkdir(parents=True, exist_ok=True)
--> 128 ORTConfigManager.check_optimization_supported_model(self.model_type)
    130 self.config.save_pretrained(save_dir)
    131 maybe_save_preprocessors(self.onnx_model_path[0].parent, save_dir)

File ~/anaconda3/envs/optimum_gpu_py3.8/lib/python3.8/site-packages/optimum/onnxruntime/utils.py:120, in ORTConfigManager.check_optimization_supported_model(cls, model_type)
    118 supported_model_types_for_optimization = ["bert", "gpt2", "bart"]
    119 if (model_type not in cls._conf) or (cls._conf[model_type] not in supported_model_types_for_optimization):
--> 120     raise KeyError(
    121         f"ONNX Runtime doesn't support the graph optimization of {model_type} yet. Only {supported_model_types_for_optimization} are supported. "
    122         f"If you want to support {model_type} please propose a PR or open up an issue in ONNX Runtime:https://github.com/microsoft/onnxruntime."
    123     )

KeyError: "ONNX Runtime doesn't support the graph optimization of t5 yet. Only ['bert', 'gpt2', 'bart'] are supported. If you want to support t5 please propose a PR or open up an issue in ONNX Runtime:https://github.com/microsoft/onnxruntime."

Motivation

Optimize performance (latency/throughput) of Flan-T5-Large model

Your contribution

I could beta test the solution.

fxmarty commented 1 year ago

Hi @Matthieu-Tinycoaching thanks for the report. Actually, the error message is a bit misleading, it rather means that this architecture should be added in https://github.com/huggingface/optimum/blob/4ea4baa77f8030a83157c0e6abbd750e61ad45da/optimum/onnxruntime/utils.py#L97. I can fix shortly!

smolskayanastassia commented 1 year ago

@Matthieu-Tinycoaching @fxmarty Does optimization fp16 work for flan-t5 on GPU?

argideritzalpea commented 1 year ago

@fxmarty It seems that this issue has been reported as fixed in the onnxruntime repo: https://github.com/microsoft/onnxruntime/issues/14886

Is anything further required to enable optimization for Flan-T5-Large?