Closed HJT9328 closed 5 months ago
export modelscope_cache = /data/LLM_model
解决了这个问题 使用本地模型 RAY_memory_monitor_refresh_ms=0 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 swift infer \ --model_type qwen1half-72b-chat \ --model_id_or_path /data/xxxxx \ --infer_backend vllm --tensor_parallel_size 8
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \