liangwq / Chatglm_lora_multi-gpu

chatglm多gpu用deepspeed和
404 stars 61 forks source link

推理阶段deepspeed --num_gpus 2 chatglm_multi_gpu_inference.py报错 #30

Open algorithmconquer opened 1 year ago

algorithmconquer commented 1 year ago

OSError:../output/ does not appear to have a file named config.json

liangwq commented 1 year ago

OSError:../output/ does not appear to have a file named config.json

s谁的config,是basemodel的还是lora的 把更细的报错贴出来

algorithmconquer commented 1 year ago

@liangwq 是lora的;更详细的错误信息:OSError: /xxx/output/ does not appear to have a file named config.json. Checkout 'https://huggingface.co//xxx/output//None' for available files.

liangwq commented 1 year ago

@liangwq 是lora的;更详细的错误信息:OSError: /xxx/output/ does not appear to have a file named config.json. Checkout 'https://huggingface.co//xxx/output//None' for available files. chatglm_deepspeed_inference.py 这个方法没有用lora,应该是basemodel问题,你可以直接换成AutoModel加载模型就好

sevenandseven commented 5 months ago

OSError:../output/ does not appear to have a file named config.json

s谁的config,是basemodel的还是lora的 把更细的报错贴出来

你好,我在使用这个代码进行多卡推理时,它分成两个线程,为什么一个线程模型成功加载,到另一个的时候,控制台就卡着不动,这是什么原因?

代码如下:

def load_model_on_gpus(checkpoint_path, num_gpus=2):

总共占用13GB显存,28层transformer每层0.39GB左右

# 第一层 word_embeddings和最后一层 lm_head 层各占用1.2GB左右
# # transformer.word_embeddings 占用1层
# transformer.final_layernorm 和 lm_head 占用1层
# transformer.layers 占用 28 层
# 总共30层分配到num_gpus张卡上
num_trans_layers = 28
#vram_per_layer = 0.39
per_gpu_layers = 30/num_gpus
#used = 1.2
device_map = {'transformer.word_embeddings': 0,
              'transformer.final_layernorm': 0, 'lm_head': 0}
used = 2
gpu_target = 0
for i in range(num_trans_layers):
    if used >= per_gpu_layers:
        gpu_target += 1
        used = 0
    assert gpu_target < num_gpus
    device_map[f'transformer.layers.{i}'] = gpu_target
    used += 1

model = ChatGLMForConditionalGeneration.from_pretrained(
    checkpoint_path,trust_remote_code=True).half()
no_split_modules = model._no_split_modules
print("no_split_modules have is:", no_split_modules )
model = model.eval()
#print("device_map is **************************", device_map)

model = load_checkpoint_and_dispatch(
    model, checkpoint_path, device_map="auto", offload_folder="offload", offload_state_dict=True,no_split_module_classes=["GLMBlock"] ).half()
print("返回模型!!!!!!!!!!!!!!")
return model

model = load_model_on_gpus("/media/ai/HDD/Teamwork/LLM_Embedding_model/LLM/chatglm3-6b", num_gpus=2)

Image_20240611161417