Open liuhongjie001 opened 6 months ago
把你调用的代码,报错的详细信息发一下。
运行环境ai studio,2块GPU 160G; 运行代码:
from pylmkit.llms import LocalLLMModel
model = LocalLLMModel(model_path='/home/aiuser/.conda/envs/llm-test1/model/Qwen-1_8B-chat', # 前面保存的模型文件路径 tokenizer_kwargs={"revision": 'master'}, model_kwargs={"revision": 'master'}, language='zh' )
// 普通模式
res = model.invoke(query="如何学习python?")
print(">>>invoke ", res)
运行结果:
9:19:49 ~/.conda/envs/llm-test1 $ /home/aiuser/.conda/envs/llm_test/bin/python /home/aiuser/.conda/envs/llm-test1/.vscode/llmDemo.py
2024-04-11 09:20:14,375 - modelscope - INFO - PyTorch version 2.1.2 Found.
2024-04-11 09:20:14,376 - modelscope - INFO - Loading ast index from /home/aiuser/.cache/modelscope/ast_indexer
2024-04-11 09:20:14,472 - modelscope - INFO - No valid ast index found from /home/aiuser/.cache/modelscope/ast_indexer, generating ast index from prebuilt!
2024-04-11 09:20:14,537 - modelscope - INFO - Loading done! Current index file version is 1.13.3, with md5 db09a39b3781812ba6a34416a85b6dff and a total number of 972 components indexed
The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency flash-attention/csrc/rotary at main · Dao-AILab/flash-attention
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency flash-attention/csrc/layer_norm at main · Dao-AILab/flash-attention
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Loading checkpoint shards: 100%|█████████████████████████████████████████████| 2/2 [00:01<00:00, 1.63it/s]
You shouldn't move a model that is dispatched using accelerate hooks.
Traceback (most recent call last):
File "/home/aiuser/.conda/envs/llm-test1/.vscode/llmDemo.py", line 11, in
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!
试一下这个:
import torch
from pylmkit.llms import LocalLLMModel
Local = LocalLLMModel(model_path='/home/aiuser/.conda/envs/llm-test1/model/Qwen-1_8B-chat', # 前面保存的模型文件路径
tokenizer_kwargs={"revision": 'master'},
model_kwargs={"revision": 'master'},
language='zh'
)
model = Local.model.to('cuda:0')
Local.model = torch.nn.DataParallel(model, device_ids=[0, 1]) # 将模型复制到两个GPU上
# 普通模式
res = Local.invoke(query="如何学习python?")
print(">>>invoke ", res)
运行成功了吗?
使用Qwen-1_8B-chat 在多GPU 环境下运行报RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!这个错,请问怎么解决