wangzhaode / mnn-llm

llm deploy project based mnn.
Apache License 2.0
1.46k stars 159 forks source link

关于llm中is_single的含义,是否进行了模型拆分从而获得多线程加速 #126

Closed Liu-xiandong closed 10 months ago

Liu-xiandong commented 10 months ago

您好,我阅读了MNN中关于llm相关的代码。推理的时候,通过is_single关键字进入不同的代码分支中,具体如下:

if (is_single_) {
        // single model
        auto outputs = modules_.back()->onForward({inputs_ids_, attention_mask, position_ids, past_key_values_[0]});
        id = outputs[0]->readMap<int>()[0];
        past_key_values_[0] = outputs[1];
    } else {
        // split block models
#ifdef USING_DISK_EMBED
        auto hidden_states = disk_embedding(input_ids);
#else
        auto hidden_states = modules_[layer_nums_ + 1]->onForward({inputs_ids_})[0];
#endif
        for (int i = 0; i < layer_nums_; i++) {
            AUTOTIME;
            auto outputs = modules_[i]->onForward({hidden_states, attention_mask, position_ids, past_key_values_[i]});
            hidden_states = outputs[0];
            past_key_values_[i] = outputs[1];
        }
        {
            AUTOTIME;
            auto outputs = modules_[layer_nums_]->onForward({hidden_states});
            id = outputs[0]->readMap<int>()[0];
        }

    }

请问MNN中针对大模型的推理,是否有模型拆分,从而获得多线程相关的加速?

wangzhaode commented 10 months ago

拆分目前没有加速的实现,只是将大模型拆成多个小模型方便debug与模型下发

Liu-xiandong commented 10 months ago

好的,非常感谢

wangzhaode commented 10 months ago

不客气 比如单个模型太大是无法上传到Github Release附件中的