wangzhaode / llm-export

llm-export can export llm model to onnx.
Apache License 2.0
225 stars 27 forks source link

通义千问-1_8B-预训练 #11

Closed Moxoo closed 9 months ago

Moxoo commented 10 months ago

大佬,请问qwen-1_8B-预训练和qwen-1_8B-chat在模型结构、导出、推理这些方面都没有区别的是吗?

因为linux上mnn2.8.0导出qwen有问题,我刚试着导出量化4的qwen-1_8B-预训练好像也存在同样问题,大佬能否提供下qwen-1_8B-预训练量化4和8的导出文件 orz..orz...orz...

wangzhaode commented 10 months ago

这个应该是没区别的,你可以自己尝试一下

Moxoo commented 10 months ago

这个应该是没区别的,你可以自己尝试一下

我试过了 onnx校验是通过的,test也ok,不过就是转mnn的时候有问题(mnn文件会比你上传的大,你是mac上转的吗?)

wangzhaode commented 10 months ago

对 你可以下载MNN的源码本地编译一个MNNConvert来转换

Moxoo commented 10 months ago

好的多谢大佬指路。编MNNConvert的话,是linux上编和mac上编都可以吗

Moxoo commented 10 months ago

对 你可以下载MNN的源码本地编译一个MNNConvert来转换

我用linux上编出来的MNNConvert来转mnn,转出来的mnn文件还是大一些。 目前mnn的bug还没fix是先要在mac上用MNNConvert转是吧?

wangzhaode commented 10 months ago

加上这个patch,重新编译转换一下试试看

diff --git a/express/Expr.cpp b/express/Expr.cpp
index 7297a061..768acdd0 100644
--- a/express/Expr.cpp
+++ b/express/Expr.cpp
@@ -192,6 +192,17 @@ EXPRP Expr::create(std::shared_ptr<BufferStorage> extra, std::vector<VARP>&& inp
     EXPRP expr(new Expr(outputSize));
     expr->mStorage = extra;
     expr->mOp = flatbuffers::GetRoot<Op>(extra->buffer());
+    switch (expr->mOp->type()) {
+        case OpType_Const:
+            expr->mType = VARP::CONSTANT;
+            break;
+        case OpType_TrainableParam:
+            expr->mType = VARP::TRAINABLE;
+            break;
+        default:
+            expr->mType = VARP::INPUT;
+            break;
+    }
     expr->mInputs   = std::move(inputs);
     auto exe = ExecutorScope::Current();
     expr->mInside->mReq = exe->getRequirement(expr.get());
Moxoo commented 10 months ago

加上这个patch,重新编译转换一下试试看

diff --git a/express/Expr.cpp b/express/Expr.cpp
index 7297a061..768acdd0 100644
--- a/express/Expr.cpp
+++ b/express/Expr.cpp
@@ -192,6 +192,17 @@ EXPRP Expr::create(std::shared_ptr<BufferStorage> extra, std::vector<VARP>&& inp
     EXPRP expr(new Expr(outputSize));
     expr->mStorage = extra;
     expr->mOp = flatbuffers::GetRoot<Op>(extra->buffer());
+    switch (expr->mOp->type()) {
+        case OpType_Const:
+            expr->mType = VARP::CONSTANT;
+            break;
+        case OpType_TrainableParam:
+            expr->mType = VARP::TRAINABLE;
+            break;
+        default:
+            expr->mType = VARP::INPUT;
+            break;
+    }
     expr->mInputs   = std::move(inputs);
     auto exe = ExecutorScope::Current();
     expr->mInside->mReq = exe->getRequirement(expr.get());

试了下还是不行。打patch后编出来然后./MNNConvert -f ONNX --modelFile file.onnx --MNNModel file.mnn --weightQuantBits 4这样转出来block、lm和embedding文件大小是正常的,但是推理还是不正常,等你们新版发布吧 j53AcxP3z4

lansexinhu commented 10 months ago

@wangzhaode 我加上了这几行后,编译出来的block 大小为24.4M,和git上提供的25.5M 还是有些差异,推理也不太正常;请问,还有别的地方需要修改吗?

DavidQiuChao commented 10 months ago

@Moxoo 你模型转成功了吗?

wangzhaode commented 10 months ago

使用MNN 2.8.1试一下

Moxoo commented 10 months ago

使用MNN 2.8.1试一下

试了下 还是不太行: int4: Uk39e204Ve int8: 0UI7CdXHZu