wangzhaode / llm-export

llm-export can export llm model to onnx.
Apache License 2.0
187 stars 21 forks source link

Unable to reproduce phi2 export? #19

Closed kmn1024 closed 7 months ago

kmn1024 commented 7 months ago

I carried out these steps on Linux desktop (i7-6700K, Ubuntu 22.04)

  1. git clone https://huggingface.co/microsoft/phi-2.git
  2. git checkout 834565c23f9b28b96ccbeabe614dd906b6db551a
  3. python llm_export.py --path /home/ck/Downloads/llm_models/phi-2 --export_split --export_token --export_mnn --onnx_path exported/phi-2/onnx --mnn_path exported/phi-2/mnn

The conversion had some warnings but was successful:

The device support i8sdot:0, support fp16:0, support i8mm: 0
...
============= Diagnostic Run torch.onnx.export version 2.0.1+cu117 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

Don't has bizCode, use MNNTest for default
Start to Convert Other Model Format To MNN Model..., target version: 2.8
[11:58:15] :46: ONNX Model ir version: 8
[11:58:15] :47: ONNX Model opset version: 15
Start to Optimize the MNN Net...
inputTensors : [ hidden_states, ]
outputTensors: [ token_id, ]
Converted Success!
/home/ck/.cache/huggingface/modules/transformers_modules/phi-2/modeling_phi.py:655: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
  cos_pos, sin_pos = rotary_pos_emb
/home/ck/.cache/huggingface/modules/transformers_modules/phi-2/modeling_phi.py:421: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if kv.shape[3] != q.shape[2]:
/home/ck/.cache/huggingface/modules/transformers_modules/phi-2/modeling_phi.py:429: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  softmax_scale = self.softmax_scale or 1.0 / math.sqrt(q.shape[-1])
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
...

The .mnn file sizes are different from the release however:

ls -ltr llm-export/exported/phi2/mnn/ # output I generated

-rw-rw-r-- 1 ck ck 524289292 Jan 16 11:58 embedding.mnn
-rw-rw-r-- 1 ck ck  65969348 Jan 16 11:58 lm.mnn
-rw-rw-r-- 1 ck ck  38095968 Jan 16 11:59 block_0.mnn
-rw-rw-r-- 1 ck ck  39684620 Jan 16 11:59 block_1.mnn
-rw-rw-r-- 1 ck ck  39684620 Jan 16 11:59 block_2.mnn
-rw-rw-r-- 1 ck ck  39684620 Jan 16 12:00 block_3.mnn
-rw-rw-r-- 1 ck ck  39684620 Jan 16 12:00 block_4.mnn
-rw-rw-r-- 1 ck ck  39684620 Jan 16 12:01 block_5.mnn
-rw-rw-r-- 1 ck ck  39684620 Jan 16 12:01 block_6.mnn
-rw-rw-r-- 1 ck ck  39684620 Jan 16 12:01 block_7.mnn
-rw-rw-r-- 1 ck ck  39684620 Jan 16 12:02 block_8.mnn
-rw-rw-r-- 1 ck ck  39684620 Jan 16 12:02 block_9.mnn
-rw-rw-r-- 1 ck ck  39684620 Jan 16 12:02 block_10.mnn
-rw-rw-r-- 1 ck ck  39684620 Jan 16 12:03 block_11.mnn
-rw-rw-r-- 1 ck ck  39684620 Jan 16 12:03 block_12.mnn
-rw-rw-r-- 1 ck ck  39684620 Jan 16 12:04 block_13.mnn
-rw-rw-r-- 1 ck ck  39684620 Jan 16 12:04 block_14.mnn
-rw-rw-r-- 1 ck ck  39684620 Jan 16 12:04 block_15.mnn
-rw-rw-r-- 1 ck ck  39684620 Jan 16 12:05 block_16.mnn
-rw-rw-r-- 1 ck ck  39684620 Jan 16 12:05 block_17.mnn
-rw-rw-r-- 1 ck ck  39684620 Jan 16 12:06 block_18.mnn
-rw-rw-r-- 1 ck ck  39684620 Jan 16 12:06 block_19.mnn
-rw-rw-r-- 1 ck ck  39684620 Jan 16 12:06 block_20.mnn
-rw-rw-r-- 1 ck ck  39684620 Jan 16 12:07 block_21.mnn
-rw-rw-r-- 1 ck ck  39684620 Jan 16 12:07 block_22.mnn
-rw-rw-r-- 1 ck ck  39684620 Jan 16 12:07 block_23.mnn
-rw-rw-r-- 1 ck ck  39684620 Jan 16 12:08 block_24.mnn
-rw-rw-r-- 1 ck ck  39684620 Jan 16 12:08 block_25.mnn
-rw-rw-r-- 1 ck ck  39684620 Jan 16 12:09 block_26.mnn
-rw-rw-r-- 1 ck ck  39684620 Jan 16 12:09 block_27.mnn
-rw-rw-r-- 1 ck ck  39684620 Jan 16 12:09 block_28.mnn
-rw-rw-r-- 1 ck ck  39684620 Jan 16 12:10 block_29.mnn
-rw-rw-r-- 1 ck ck  39684620 Jan 16 12:10 block_30.mnn
-rw-rw-r-- 1 ck ck  39685956 Jan 16 12:10 block_31.mnn
-rw-rw-r-- 1 ck ck    549831 Jan 16 12:52 tokenizer.txt

ls -ltr mnn-llm/phi2/ # release download

-rw-rw-r-- 1 ck ck  37196236 Dec 22 17:30 block_0.mnn
-rw-rw-r-- 1 ck ck  39592456 Dec 22 17:30 block_12.mnn
-rw-rw-r-- 1 ck ck  39592456 Dec 22 17:30 block_13.mnn
-rw-rw-r-- 1 ck ck  39592456 Dec 22 17:30 block_10.mnn
-rw-rw-r-- 1 ck ck  39592456 Dec 22 17:30 block_11.mnn
-rw-rw-r-- 1 ck ck  39592456 Dec 22 17:30 block_14.mnn
-rw-rw-r-- 1 ck ck  39592456 Dec 22 17:30 block_15.mnn
-rw-rw-r-- 1 ck ck  39592456 Dec 22 17:30 block_17.mnn
-rw-rw-r-- 1 ck ck  39592456 Dec 22 17:30 block_16.mnn
-rw-rw-r-- 1 ck ck  39592456 Dec 22 17:30 block_18.mnn
-rw-rw-r-- 1 ck ck  39592456 Dec 22 17:31 block_1.mnn
-rw-rw-r-- 1 ck ck  39592456 Dec 22 17:31 block_20.mnn
-rw-rw-r-- 1 ck ck  39592456 Dec 22 17:31 block_22.mnn
-rw-rw-r-- 1 ck ck  39592456 Dec 22 17:31 block_19.mnn
-rw-rw-r-- 1 ck ck  39592456 Dec 22 17:31 block_21.mnn
-rw-rw-r-- 1 ck ck  39592456 Dec 22 17:32 block_24.mnn
-rw-rw-r-- 1 ck ck  39592456 Dec 22 17:32 block_23.mnn
-rw-rw-r-- 1 ck ck  39592456 Dec 22 17:32 block_25.mnn
-rw-rw-r-- 1 ck ck  39592456 Dec 22 17:32 block_26.mnn
-rw-rw-r-- 1 ck ck  39592456 Dec 22 17:32 block_27.mnn
-rw-rw-r-- 1 ck ck  39533344 Dec 22 17:33 block_29.mnn
-rw-rw-r-- 1 ck ck  39197824 Dec 22 17:33 block_30.mnn
-rw-rw-r-- 1 ck ck  39592456 Dec 22 17:33 block_2.mnn
-rw-rw-r-- 1 ck ck  39297988 Dec 22 17:33 block_31.mnn
-rw-rw-r-- 1 ck ck  39592456 Dec 22 17:33 block_28.mnn
-rw-rw-r-- 1 ck ck  39592456 Dec 22 17:33 block_3.mnn
-rw-rw-r-- 1 ck ck  39592456 Dec 22 17:33 block_4.mnn
-rw-rw-r-- 1 ck ck  39592456 Dec 22 17:34 block_5.mnn
-rw-rw-r-- 1 ck ck  39592456 Dec 22 17:34 block_6.mnn
-rw-rw-r-- 1 ck ck  39592456 Dec 22 17:34 block_7.mnn
-rw-rw-r-- 1 ck ck    549831 Dec 22 17:34 tokenizer.txt
-rw-rw-r-- 1 ck ck  39592456 Dec 22 17:34 block_8.mnn
-rw-rw-r-- 1 ck ck  39592456 Dec 22 17:34 block_9.mnn
-rw-rw-r-- 1 ck ck  66174144 Dec 22 17:34 lm.mnn
-rw-rw-r-- 1 ck ck 262145280 Dec 22 17:35 embedding.mnn

And when I test with cli_demo, the responses are very different:

./build/cli_demo /home/ck/git/mnn/llm-export/exported/phi2/mnn/


model path is /home/ck/git/mnn/llm-export/exported/phi2/mnn/
### model name : Phi_2
The device support i8sdot:0, support fp16:0, support i8mm: 0
...
Q: Einstein said

A:

./build/cli_demo /home/ck/git/mnn/mnn-llm/phi2/


model path is /home/ck/git/mnn/mnn-llm/phi2/
### model name : Phi_2
The device support i8sdot:0, support fp16:0, support i8mm: 0
...
Q: Einstein said

A: 's theory of general relativity, which describes the force of gravity, is the most successful theory in the history of science. It has been tested and confirmed by countless experiments and observations, and it has been used to make predictions and discoveries in various fields, such...

kmn1024 commented 7 months ago

Issue seems similar to https://github.com/wangzhaode/llm-export/issues/8, but even after I upgrade to MNN=2.8.1 and repeat the steps, nothing changes...

kmn1024 commented 7 months ago

用了./MNNDump2Json 才发现 phi2 release 是改了这些:

  1. export_lm 时要 asymmetric=True
  2. export_block 时要 asymmetric=False 这句应该是bug: https://github.com/wangzhaode/llm-export/blob/master/llm_export.py#L788

改了之后就没问题了。

wangzhaode commented 7 months ago

是的,phi-2需要使用对称量化,这里有写:https://github.com/wangzhaode/llm-export/blob/7b442e4075cb909ba5803f0f3a0557e1ee83e57c/llm_export.py#L788C10-L788C10

本来在转换MNN时会使用asymmetric=False 的,但是在某次push时把phi-2的那个patch修改错了

kmn1024 commented 7 months ago

谢大佬!