Qwen1.5-1.8B-Chat模型转mnn失败

WangHao311 commented 1 month ago

代码： 2024-07-18 拉取的master分支最新代码；

主要环境： MNN==2.8.3 numpy==1.26.4 onnxruntime==1.15.1 torch==2.0.1 transformers==4.41.2 transformers_stream_generator==0.0.4 sentencepiece==0.1.99 onnxslim==0.1.32

转换命令： python llm_export.py --path ./Qwen1.5-1.8B-Chat --export --export_embed --embed_bin --export_token --export_mnn --type Qwen1_5-1_8B-Chat --onnx_path ./qwen1.5-1.8b-onnx --mnn_path ./qwen1.5-1.8b-mnn

日志如下：

The device support i8sdot:0, support fp16:0, support i8mm: 0
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
export start ...
/root/.cache/huggingface/modules/transformers_modules/Qwen1.5-1.8B-Chat/modeling_qwen2.py:306: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
  cos, sin = rotary_pos_emb
/root/.cache/huggingface/modules/transformers_modules/Qwen1.5-1.8B-Chat/modeling_qwen2.py:323: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz, self.num_heads, q_len, kv_seq_len):
/root/.cache/huggingface/modules/transformers_modules/Qwen1.5-1.8B-Chat/modeling_qwen2.py:330: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attention_mask.size() != (bsz, 1, q_len, kv_seq_len):
/root/.cache/huggingface/modules/transformers_modules/Qwen1.5-1.8B-Chat/modeling_qwen2.py:342: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):
============= Diagnostic Run torch.onnx.export version 2.0.1+cu117 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

export done!
self.skip_slim: False
Killed

从日志看，在执行slim(onnx_model, output_model=onnx_model)的过程出错了。 @wangzhaode @inisis 请大佬帮忙看一下，我应该如何解决？

inisis commented 1 month ago

Killed means out of memory, maybe you can add --skip_slim

WangHao311 commented 1 month ago

Thank you very much, “--skip_slim” is helpful. But by the way, how much memory is needed? My machine is 32GB.

inisis commented 1 month ago

you can do it offline

onnxslim ./qwen1.5-1.8b-onnx/llm.onnx ./qwen1.5-1.8b-onnx/slim.onnx

WangHao311 commented 1 month ago

The offline onnxslim command is available, but how can I get the final file similar to this link? https://modelscope.cn/models/zhaode/Qwen1.5-1.8B-Chat-MNN/files

inisis commented 1 month ago

you can converte it offline too

mnnconvert -f ONNX --modelFile ./qwen1.5-1.8b-onnx/slim.onnx --MNNModel ./qwen1.5-1.8b-onnx/qwen-1.8b-int4.mnn --weightQuantBits 4 --weightQuantAsymmetric --saveExternalData

WangHao311 commented 1 month ago

This's great, thank you!

wangzhaode / llm-export

Qwen1.5-1.8B-Chat模型转mnn失败 #54