ztxz16 / fastllm

纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行
Apache License 2.0
3.23k stars 325 forks source link

MiniCPM模型Win32Demo工程编译、GPU执行问题修复 #428

Closed TylunasLi closed 4 months ago

TylunasLi commented 4 months ago
  1. 修复加入MiniCpm模型后,Win32Demo工程编译问题

  2. OpenBMB/MiniCPM#60 中反馈的低内存模式错误,是由于权重引用错误导致的。该问题在GPU初始化时也会出现。

    diff --git a/src/models/minicpm.cpp b/src/models/minicpm.cpp
    index 7085b7a..1ee4aa3 100644
    --- a/src/models/minicpm.cpp
    +++ b/src/models/minicpm.cpp
    @@ -241,8 +241,8 @@
         {
             auto &hiddenStates = *lastHiddenStates;
             RMSNorm(hiddenStates, weight["model.norm.weight"], 1e-5, hiddenStates);
             Mul(hiddenStates, rms_scale, hiddenStates);
    -            Linear(hiddenStates, weight["model.embed_tokens.weight"], Data(), logits);
    +            Linear(hiddenStates, weight["lm_head.weight"], Data(), logits);
             if (generationConfig.output_logits && retLogits != nullptr) {
                 int size = logits.dims.back();
                 logits.ToDevice(DataDevice::CPU);

    这次作了修复。

  3. 将获取缩放因子统一放在initParams()阶段。