s-JoL / Open-Llama

The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.
https://huggingface.co/s-JoL/Open-Llama-V2
MIT License
30 stars 4 forks source link

请问一下指令微调(instruct finetune)可以使用32GB显存的v100吗? #61

Closed honglianglv closed 1 year ago

honglianglv commented 1 year ago

请问一下指令微调(instruct finetune)可以使用32GB显存的v100吗? 目前有4卡的v100,每张卡显存是32G,有没有办法可以使用4卡的v100进行指令微调操作? 试过了stage2还是报显存不够。

试过ds_stage3的配置文件,发现报了如下错误,有人知道是什么原因吗?非常感谢 启动命令如下:accelerate launch --config_file configs/accelerate_configs/ds_stage3.yaml train_lm.py --train_config configs/instruct_config.yaml --model_config configs/model_configs/7B.json

报错如下: File "/home/fenbi/miniconda3/envs/mc-model/lib/python3.9/site-packages/transformers/models/open_llama/modeling_open_llama.py", line 385, in _init_weights module.weight.data[module.paddingidx].zero() IndexError: index 32000 is out of bounds for dimension 0 with size 0

honglianglv commented 1 year ago

目前使用stage2的配置,修改了一下配置,可以跑起来了:(只修改了下面标粗的两行) compute_environment: LOCAL_MACHINE deepspeed_config: deepspeed_multinode_launcher: standard gradient_clipping: 1.0 offload_optimizer_device: cpu offload_param_device: none zero3_init_flag: false zero_stage: 2 distributed_type: DEEPSPEED fsdp_config: {} machine_rank: 0 main_training_function: main mixed_precision: bf16 num_machines: 1 num_processes: 3 rdzv_backend: static same_network: true use_cpu: cpu