Qwen1.5-1.8B-Chat模型转换后与modelscope上的有差别

我在ubuntu上转换Qwen1.5-1.8B-Chat模型，命令为python llm_export.py --path /home/ubuntu/tmp/Qwen1.5-1.8B-Chat --export --export_embed --embed_bin --export_token --export_mnn --type Qwen1_5-1_8B-Chat，转换出来的mnn和mnn.weight文件与modelscope上的（https://modelscope.cn/models/zhaode/Qwen1.5-1.8B-Chat-MNN）的文件大小有所出入，mnn文件大了不少。使用mnn-llm进行推理时，prefill速度相差不明显，但decode速度与使用modelscope上的Qwen-1_8B-Chat-MNN相比下降不少。有人了解原因吗？

微信截图_20240719181655

wangzhaode / llm-export

Qwen1.5-1.8B-Chat模型转换后与modelscope上的有差别 #55