wangzhaode / llm-export

llm-export can export llm model to onnx.
Apache License 2.0
187 stars 21 forks source link

llm-export

English

llm-export是一个llm模型导出工具,能够将llm模型导出为onnx和mnn模型。

安装

# pip install
pip install llmexport

# git install
pip install git+https://github.com/wangzhaode/llm-export@master

# local install
git clone https://github.com/wangzhaode/llm-export && cd llm-export/
pip install .

用法

  1. 将需要导出的LLM项目clone到本地,如:chatglm2-6b
    git clone https://huggingface.co/THUDM/chatglm2-6b
    # 如果huggingface下载慢可以使用modelscope
    git clone https://modelscope.cn/ZhipuAI/chatglm2-6b.git
  2. 导出模型
    # 将chatglm2-6b导出为onnx模型
    llmexport --path ../chatglm2-6b --export onnx
    # 将chatglm2-6b导出为mnn模型, 量化参数为4bit, blokc-wise = 128
    llmexport --path ../chatglm2-6b --export mnn --quant_bit 4 --quant_block 128

功能

参数

usage: llmexport [-h] --path PATH [--type TYPE] [--lora_path LORA_PATH] [--dst_path DST_PATH] [--test TEST] [--export EXPORT] [--skip_slim] [--quant_bit QUANT_BIT] [--quant_block QUANT_BLOCK]
                 [--lm_quant_bit LM_QUANT_BIT]

llm_exporter

optional arguments:
  -h, --help            show this help message and exit
  --path PATH           path(`str` or `os.PathLike`):
                        Can be either:
                            - A string, the *model id* of a pretrained model like `THUDM/chatglm-6b`. [TODO]
                            - A path to a *directory* clone from repo like `../chatglm-6b`.
  --type TYPE           type(`str`, *optional*):
                            The pretrain llm model type.
  --lora_path LORA_PATH
                        lora path, defaut is `None` mean not apply lora.
  --dst_path DST_PATH   export onnx/mnn model to path, defaut is `./model`.
  --test TEST           test model inference with query `TEST`.
  --export EXPORT       export model to an onnx/mnn model.
  --skip_slim           Whether or not to skip onnx-slim.
  --quant_bit QUANT_BIT
                        mnn quant bit, 4 or 8, default is 4.
  --quant_block QUANT_BLOCK
                        mnn quant block, default is 0 mean channle-wise.
  --lm_quant_bit LM_QUANT_BIT
                        mnn lm_head quant bit, 4 or 8, default is `quant_bit`.

支持模型