Open sabre-code opened 1 month ago
@sabre-code, could you please try running this model with onnxruntime-genai? And here is the example to create the model and run the similar model: https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/README.md#get-the-model
Describe the issue
We encountered an issue while using SeaLLM v2, a 7B model, in ONNX format with int8 quantization for translation purposes. Here are the steps we followed and the problem we're facing:
Model Conversion to ONNX:
Model Quantization:
quantize_dynamic()
function to convert the fp32 model to int8.Issue:
To reproduce
Steps to Reproduce:
quantize_dynamic()
.Urgency
No response
Platform
Linux
OS Version
Ubutu 20.04.6
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.18.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
Yes