huggingface / optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
https://huggingface.co/docs/optimum/main/
Apache License 2.0
2.4k stars 423 forks source link

C++ ONNX export of HuggingFace Transformers models #1532

Open vymao opened 9 months ago

vymao commented 9 months ago

Feature request

Right now, it seems like the only implementation for ONNX exports of HuggingFace models is to use them in Python inferencing. It would be great we could export Hugging Face Transformers models to ONNX and use them in C++ applications.

Motivation

For C++ applications.

Your contribution

Will try, but probably not.

fxmarty commented 7 months ago

Thank you, I agree it would be very nice.

This library https://github.com/marella/ctransformers is close to what you are suggesting, except that it is using GGML instead of ONNX Runtime for the inference.