Open algorithmconquer opened 1 year ago
OSError:../output/ does not appear to have a file named config.json
s谁的config,是basemodel的还是lora的 把更细的报错贴出来
@liangwq 是lora的;更详细的错误信息:OSError: /xxx/output/ does not appear to have a file named config.json. Checkout 'https://huggingface.co//xxx/output//None' for available files.
@liangwq 是lora的;更详细的错误信息:OSError: /xxx/output/ does not appear to have a file named config.json. Checkout 'https://huggingface.co//xxx/output//None' for available files. chatglm_deepspeed_inference.py 这个方法没有用lora,应该是basemodel问题,你可以直接换成AutoModel加载模型就好
OSError:../output/ does not appear to have a file named config.json
s谁的config,是basemodel的还是lora的 把更细的报错贴出来
你好,我在使用这个代码进行多卡推理时,它分成两个线程,为什么一个线程模型成功加载,到另一个的时候,控制台就卡着不动,这是什么原因?
代码如下:
def load_model_on_gpus(checkpoint_path, num_gpus=2):
# 第一层 word_embeddings和最后一层 lm_head 层各占用1.2GB左右
# # transformer.word_embeddings 占用1层
# transformer.final_layernorm 和 lm_head 占用1层
# transformer.layers 占用 28 层
# 总共30层分配到num_gpus张卡上
num_trans_layers = 28
#vram_per_layer = 0.39
per_gpu_layers = 30/num_gpus
#used = 1.2
device_map = {'transformer.word_embeddings': 0,
'transformer.final_layernorm': 0, 'lm_head': 0}
used = 2
gpu_target = 0
for i in range(num_trans_layers):
if used >= per_gpu_layers:
gpu_target += 1
used = 0
assert gpu_target < num_gpus
device_map[f'transformer.layers.{i}'] = gpu_target
used += 1
model = ChatGLMForConditionalGeneration.from_pretrained(
checkpoint_path,trust_remote_code=True).half()
no_split_modules = model._no_split_modules
print("no_split_modules have is:", no_split_modules )
model = model.eval()
#print("device_map is **************************", device_map)
model = load_checkpoint_and_dispatch(
model, checkpoint_path, device_map="auto", offload_folder="offload", offload_state_dict=True,no_split_module_classes=["GLMBlock"] ).half()
print("返回模型!!!!!!!!!!!!!!")
return model
model = load_model_on_gpus("/media/ai/HDD/Teamwork/LLM_Embedding_model/LLM/chatglm3-6b", num_gpus=2)
OSError:../output/ does not appear to have a file named config.json