wenda-LLM / wenda

闻达:一个LLM调用平台。目标为针对特定环境的高效内容生成,同时考虑个人和中小企业的计算资源局限性,以及知识安全和私密性问题
GNU Affero General Public License v3.0
6.24k stars 811 forks source link

UBUNTU服务器多卡部署chatglm2-6b,加载模型时遇到的两种情况及解决办法 #416

Closed xenos-code closed 1 year ago

xenos-code commented 1 year ago

Describe the bug UBUNTU服务器多卡部署chatglm2-6b,加载模型时遇到两种情况: 情况一:check_device_map The device_map provided does not give any device for the following parameters:.... 原因:层命名不一致 解决办法:只加载chatglm2-6b的话,修改llms/llm_glm6b.py 50行

device_map = {'transformer.word_embeddings': start_device,

    #              'transformer.final_layernorm': start_device, 'lm_head': start_device}
    #
    device_map = {'transformer.embedding.word_embeddings': start_device,
                  'transformer.output_layer': start_device,
                  'transformer.rotary_pos_emb': start_device,
                  'transformer.encoder.final_layernorm': start_device,
                  'lm_head': start_device,
                  'base_model.model.lm_head': start_device, }

考虑其他模型的话,参考https://github.com/imClumsyPanda/langchain-ChatGLM/issues/732做相应的修改

bugfix: PEFT加载lora模型出现的层命名不同

if self.lora:
    layer_prefix = 'base_model.model.transformer'
    device_map = {f'{layer_prefix}.word_embeddings': 0,
              f'{layer_prefix}.final_layernorm': 0, 'lm_head': 0,
              f'base_model.model.lm_head': 0, }
elif self.model_name.find("chatglm2-6b") != -1:
    layer_prefix = 'transformer.encoder'
    device_map = {'transformer.embedding.word_embeddings': 0,
                  'transformer.output_layer': 0,
                  'transformer.rotary_pos_emb': 0,
                  'transformer.encoder.final_layernorm': 0,
                  'lm_head': 0,
                  'base_model.model.lm_head': 0, }
else:
    layer_prefix = 'transformer'
    device_map = {f'{layer_prefix}.word_embeddings': 0,
              f'{layer_prefix}.final_layernorm': 0, 'lm_head': 0,
              f'base_model.model.lm_head': 0, }

情况二:著名的OOM 解决办法:修改llms/llm_glm6b.py 81行 if "chatglm2" in settings.llm.path: model = AutoModel.from_pretrained(settings.llm.path, local_files_only=True, trust_remote_code=True, revision="v1.1.0").quantize(4).cuda() 这里如果加参数device=device,报OOM。权宜之计,降低要求,按照官方文档

按需修改,目前只支持 4/8 bit 量化

model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).quantize(8).cuda()

注:以上两种情况在上个版本均未出现。

serfan commented 1 year ago

第二个问题在我的电脑上同样复现,暂时这样修改解决。 if device == 'cuda': model = AutoModel.from_pretrained(settings.llm.path, local_files_only=True, trust_remote_code=True,revision="v1.1.0").cuda() else: model = AutoModel.from_pretrained(settings.llm.path, local_files_only=True, trust_remote_code=True, device=device, revision="v1.1.0")

l15y commented 1 year ago

现在已经支持