Support for Meta LLaMA 3 with ORTModelForCausalLM for Faster Inference

Feature request

I would like to request support for using Meta LLaMA 3 with ORTModelForCausalLM for faster inference. This integration would leverage the capabilities of the ONNX Runtime (ORT) to optimize and accelerate the performance of Meta LLaMA 3 models.

Motivation

Currently, there is no direct support for integrating Meta LLaMA 3 with ORTModelForCausalLM on Hugging Face. This lack of integration leads to slower inference times, which can be a significant bottleneck in applications requiring real-time or near-real-time responses. Providing support for this integration would greatly enhance the performance and usability of Meta LLaMA 3 models, particularly in production environments where inference speed is critical.

Your contribution

While I may not have the expertise to implement this feature myself, I am willing to assist with testing and providing feedback on the integration process. Additionally, I can help with documentation and usage examples once the feature is implemented.

huggingface / optimum