Is it possible to compile pipeline (with tokenizer) to ONNX Runtime?

huggingface / optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools

https://huggingface.co/docs/optimum/main/

Apache License 2.0

2.52k stars 453 forks source link

Is it possible to compile pipeline (with tokenizer) to ONNX Runtime? #1318

Open j-adamczyk opened 1 year ago

j-adamczyk commented 1 year ago

Feature request

Is it possible to compile the entire pipeline, tokenizer and transformer, to run with ONNX Runtime? My goal is to remove the transformers dependency entirely for runtime, to reduce serverless cold start.

Motivation

I could not find any examples, and could not make this work, so I wonder if compiling tokenizer with ONNX is possible at all.

Your contribution

I could try implementing this, or add an example to documentation if this is possible already.

fxmarty commented 1 year ago

Hi @j-adamczyk, not through Optimum. I believe ORT folks did somethink along this line with the latest ORT release, see: https://medium.com/microsoftazure/build-and-deploy-fast-and-portable-speech-recognition-applications-with-onnx-runtime-and-whisper-5bf0969dd56b