How to export q4f16.onnx

xenova / transformers.js

State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!

https://huggingface.co/docs/transformers.js

Apache License 2.0

11.08k stars 684 forks source link

How to export q4f16.onnx #823

Closed juntaosun closed 3 months ago

juntaosun commented 3 months ago

Question

Thanks for providing such a great project, but I have a problem converting the model.

For example:  
model_q4f16.onnx

What command is used to create and export such a q4/f16.onnx model? Can you give me more tips or help? Thank you

xenova commented 3 months ago

I will add it to the conversion script, but you essentially just need to convert the q4 model to fp16 with:

from onnxconverter_common import float16
import onnx
import os

loaded_q4_model = onnx.load_model('./model_q4.onnx')

model_q4fp16 = float16.convert_float_to_float16(
    loaded_q4_model,
    keep_io_types=True,
    disable_shape_infer=True,
)

save_path = './model_q4f16.onnx'
onnx.save(model_q4fp16, save_path,
          convert_attribute=False,
          all_tensors_to_one_file=True,
)