yangjianxin1 / Firefly

Firefly: 大模型训练工具，支持训练Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型

5.6k stars 505 forks source link

Qlora如何指定某一张卡，单卡训练？ #235

Open zhl970124 opened 4 months ago

zhl970124 commented 4 months ago

加载模型

print("加载模型----")
model = AutoModelForCausalLM.from_pretrained(
    args.model_name_or_path,
    device_map="auto",
    # device_map=device_map,
    load_in_4bit=True,
    torch_dtype=torch.float16,
    trust_remote_code=True,
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=torch.float16,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        llm_int8_threshold=6.0,
        llm_int8_has_fp16_weight=False,
    ),
)

这里的 device_map="auto", 会调用多卡训练，请问如何指定固定的显卡训练。尝试过device_map={'': int(os.environ.get('LOCAL_RANK', '1'))} 以及 device_map = "cuda:1" 都不行，报错信息：【RuntimeError: CUDA error: invalid device ordinal】有没有大佬能帮忙解答下~ 感谢

l-i-p-f commented 3 months ago

可以在训练启动的时候通过CUDA_VISIBLE_DEVICES参数指定，可以指定单卡或多卡，如：

CUDA_VISIBLE_DEVICES=5,6,7 python train.py

l-i-p-f commented 3 months ago

可以在训练启动的时候通过CUDA_VISIBLE_DEVICES参数指定，可以指定单卡或多卡，如：

CUDA_VISIBLE_DEVICES=5,6,7 python train.py

单卡可以用上面的方式。

但多卡训练，上面方式能启动，但好像没有真正的多卡并行，因为想优化配置参数，但显存占用及需要的训练耗时没有变化。

改用deepspeed，相同的训练数据量，优化配置参数，耗时从需要182h下降到48h。

deepspeed --include=localhost:5,6,7 train.py

Kenneth0901 commented 3 months ago

device_map=｛“”：0｝

Kenneth0901 commented 3 months ago

可以在训练启动的时候通过CUDA_VISIBLE_DEVICES参数指定，可以指定单卡或多卡，如： CUDA_VISIBLE_DEVICES=5,6,7 python train.py

单卡可以用上面的方式。

但多卡训练，上面方式能启动，但好像没有真正的多卡并行，因为想优化配置参数，但显存占用及需要的训练耗时没有变化。

改用deepspeed，相同的训练数据量，优化配置参数，耗时从需要182h下降到48h。
deepspeed --include=localhost:5,6,7 train.py

学到了这就回去试试deepseed