Open WeiminLee opened 4 months ago
model, model_args = AutoModel.from_pretrained(
args.from_pretrained,
args=argparse.Namespace(
deepspeed=None,
local_rank=0,
rank=0,
world_size=1,
model_parallel_size=1,
mode='inference',
skip_init=True,
use_gpu_initialization=True if torch.cuda.is_available() else False,
device='cuda',
overwrite_args={'model_parallel_size': 2},
**vars(args)
))
model.add_mixin('auto-regressive', CachedAutoregressiveMixin())
self.model = model.eval()
再请教一个问题,训练好的sat模型如何再两张GPU上加载并推理? 我目前只有A100(40G)的版本,有时推理会报内u才能溢出。 请问如何设置多GPU推理?
下面模型加载部分如何设置呢?谢谢
# load model model, model_args = AutoModel.from_pretrained( args.from_pretrained, args=argparse.Namespace( deepspeed=None, local_rank=0, rank=0, world_size=1, model_parallel_size=1, mode='inference', skip_init=True, use_gpu_initialization=True if torch.cuda.is_available() else False, device='cuda', overwrite_args={'model_parallel_size': 2}, **vars(args) )) model.add_mixin('auto-regressive', CachedAutoregressiveMixin()) self.model = model.eval()
你好,请问你这个问题解决了吗?
load model
model, model_args = AutoModel.from_pretrained( args.from_pretrained, args=argparse.Namespace( deepspeed=None, local_rank=0, rank=0, world_size=1, model_parallel_size=1, mode='inference', skip_init=True, use_gpu_initialization=True if torch.cuda.is_available() else False, device='cuda', overwrite_args={'model_parallel_size': 2}, **vars(args) )) model.add_mixin('auto-regressive', CachedAutoregressiveMixin()) self.model = model.eval()
Not work for me, It seems only One GPU is used, although i have 2 GPU in one node. and I keep get oom error as below, Any help? Thanks!
(py3-11) ubuntu@10-60-241-123:/home/ScreenAgent/ScreenAgent/train$ vi cogagent_model_worker.py
(py3-11) ubuntu@10-60-241-123:/home/ScreenAgent/ScreenAgent/train$ RANK=0 WORLD_SIZE=1 LOCAL_WORLD_SIZE=2 LOCAL_RANK=0 sudo /home/ubuntu/miniconda3/envs/py3-11/bin/python ./cogagent_model_worker.py --host 0.0.0.0 --port 40000 --from_pretrained "./saved_models/ScreenAgent-2312" --bf16 --max_length 2048
[2024-06-22 14:25:46,646] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] using untested triton version (2.3.0), only 1.0.0 is known to be compatible
2024-06-22 14:25:48 | INFO | model_worker | args: Namespace(host='0.0.0.0', port=40000, max_length=2048, top_p=0.4, top_k=1, temperature=0.8, chinese=False, version='chat', quant=None, stream_chat=False, from_pretrained='./saved_models/ScreenAgent-2312', local_tokenizer='lmsys/vicuna-7b-v1.5', fp16=False, bf16=True)
[2024-06-22 14:25:48,488] [INFO] building FineTuneTrainCogAgentModel model ...
2024-06-22 14:25:48 | INFO | sat | building FineTuneTrainCogAgentModel model ...
[2024-06-22 14:25:48,490] [INFO] [RANK 0] > initializing model parallel with size 1
2024-06-22 14:25:48 | INFO | sat | [RANK 0] > initializing model parallel with size 1
[2024-06-22 14:25:48,491] [INFO] [RANK 0] You are using model-only mode.
For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK.
2024-06-22 14:25:48 | INFO | sat | [RANK 0] You are using model-only mode.
For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK.
2024-06-22 14:26:07 | ERROR | stderr | [rank0]: Traceback (most recent call last):
2024-06-22 14:26:07 | ERROR | stderr | [rank0]: File "/home/ScreenAgent/ScreenAgent/train/./cogagent_model_worker.py", line 146, in
再请教一个问题,训练好的sat模型如何再两张GPU上加载并推理? 我目前只有A100(40G)的版本,有时推理会报内u才能溢出。 请问如何设置多GPU推理?
下面模型加载部分如何设置呢?谢谢
# load model model, model_args = AutoModel.from_pretrained( args.from_pretrained, args=argparse.Namespace( deepspeed=None, local_rank=0, rank=0, world_size=1, model_parallel_size=1, mode='inference', skip_init=True, use_gpu_initialization=True if torch.cuda.is_available() else False, device='cuda', overwrite_args={'model_parallel_size': 2}, **vars(args) )) model.add_mixin('auto-regressive', CachedAutoregressiveMixin()) self.model = model.eval()