hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
34.32k stars 4.23k forks source link

CUDA_VISIBLE_DEVICES=1执行vllm推理,显卡推理却是GPU0,lamafactory-cli train和lamafactory-cli chat都正常 #5686

Closed abc-w closed 3 weeks ago

abc-w commented 1 month ago

Reminder

System Info

CUDA_VISIBLE_DEVICES=1 API_PORT=8000 llamafactory-cli api --model_name_or_path /hdd/pingchuan/Qwen2.5-7B-Instruct --adapter_name_or_path /hdd/pingchuan/lora/sft --template qwen --finetuning_type lora --infer_backend vllm 我使用这个脚本进行vllm GPU1推理,显卡推理却使用GPU0推理 [rank0]: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.97 GiB. GPU 0 has a total capacity of 79.25 GiB of which 334.06 MiB is free. Process 627536 has 0 bytes memory in use. Process 627564 has 0 bytes memory in use. Process 308335 has 41.09 GiB memory in use. Including non-PyTorch memory, this process has 32.68 GiB memory in use. Of the allocated memory 32.16 GiB is allocated by PyTorch, and 24.73 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

vllm version:0.6.2 llamafactory是最新版本

Reproduction

CUDA_VISIBLE_DEVICES=1 API_PORT=8000 llamafactory-cli api --model_name_or_path /hdd/pingchuan/Qwen2.5-7B-Instruct --adapter_name_or_path /hdd/pingchuan/lora/sft --template qwen --finetuning_type lora --infer_backend vllm 我使用这个脚本进行vllm GPU1推理,显卡推理却使用GPU0推理 [rank0]: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.97 GiB. GPU 0 has a total capacity of 79.25 GiB of which 334.06 MiB is free. Process 627536 has 0 bytes memory in use. Process 627564 has 0 bytes memory in use. Process 308335 has 41.09 GiB memory in use. Including non-PyTorch memory, this process has 32.68 GiB memory in use. Of the allocated memory 32.16 GiB is allocated by PyTorch, and 24.73 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

vllm version:0.6.2 llamafactory是最新版本

Expected behavior

No response

Others

No response

Oishiscarlett commented 3 weeks ago

请问解决了吗?

abc-w commented 3 weeks ago

请问解决了吗? 解决了,其实用的就是GPU1,显示GPU0