Open saleshwaram opened 5 months ago
Hi! are you sure llama3 doesn't work ? it's the same architecture/model_type of llama2 so it should work out of the box
I'm running a script locally to export it to see if it works (the export is going smoothly with meta-llama/Meta-Llama-3-8B
)
Feature request
I would like to request support for using Meta LLaMA 3 with ORTModelForCausalLM for faster inference. This integration would leverage the capabilities of the ONNX Runtime (ORT) to optimize and accelerate the performance of Meta LLaMA 3 models.
Motivation
Currently, there is no direct support for integrating Meta LLaMA 3 with ORTModelForCausalLM on Hugging Face. This lack of integration leads to slower inference times, which can be a significant bottleneck in applications requiring real-time or near-real-time responses. Providing support for this integration would greatly enhance the performance and usability of Meta LLaMA 3 models, particularly in production environments where inference speed is critical.
Your contribution
While I may not have the expertise to implement this feature myself, I am willing to assist with testing and providing feedback on the integration process. Additionally, I can help with documentation and usage examples once the feature is implemented.