🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
using deepspeed original json config, when using bf16, get the error RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::BFloat16 != c10::Half. #3197
[ ] One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
model = AutoModelForCausalLM.from_pretrained(base_model_path,
quantization_config=bnb_config,
torch_dtype=torch.bfloat16)
and if I configure deepspeed in accelerate config, then I can use bf16, the config as follows:
compute_environment: LOCAL_MACHINE
debug: true
deepspeed_config:
deepspeed_multinode_launcher: standard
offload_optimizer_device: cpu
offload_param_device: cpu
zero3_init_flag: true
zero3_save_16bit_model: false
zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
enable_cpu_affinity: false
machine_rank: 0
main_process_ip: 192.168.252.20
main_process_port: 25253
main_training_function: main
num_machines: 2
num_processes: 2
mixed_precision: bf16
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: true
If I add mixed_precision: bf16 to 1. config file like this
compute_environment: LOCAL_MACHINE
debug: true
deepspeed_config:
deepspeed_config_file: /home/user/work/screenplays_sft/ds_zero3_cpu_offload.config
zero3_init_flag: true
deepspeed_multinode_launcher: standard
main_process_ip: 192.168.252.20
main_process_port: 25253
distributed_type: DEEPSPEED
downcast_bf16: true
mixed_precision: bf16
enable_cpu_affinity: false
machine_rank: 0
main_training_function: main
num_machines: 2
num_processes: 2
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
the I got error "ValueError: When using deepspeed_config_file, the following accelerate config variables will be ignored: ['gradient_accumulation_steps', 'gradient_clipping', 'zero_stage', 'offload_optimizer_device', 'offload_param_device', 'offload_param_nvme_path', 'offload_optimizer_nvme_path', 'zero3_save_16bit_model', 'mixed_precision']."
could you please tell me the reason, thank you very much.
Expected behavior
using bf16 to train when using deepspeed original json config file
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
model = AutoModelForCausalLM.from_pretrained(base_model_path, quantization_config=bnb_config, torch_dtype=torch.bfloat16)
deepspeed_config_file
, the following accelerate config variables will be ignored: ['gradient_accumulation_steps', 'gradient_clipping', 'zero_stage', 'offload_optimizer_device', 'offload_param_device', 'offload_param_nvme_path', 'offload_optimizer_nvme_path', 'zero3_save_16bit_model', 'mixed_precision']."could you please tell me the reason, thank you very much.
Expected behavior
using bf16 to train when using deepspeed original json config file