Closed kevinsummer219 closed 2 weeks ago
I can use the command to train: accelerate launch --config_file config.yaml I want change the command: mpirun ........ accelerate launch --config_file config.yaml How can I modify this command? Or is there a solution? thanks very much!
If you follow accelerate config
and select CPU, it will give you an option to configure your config.yaml
to call mpirun
when doing accelerate launch
. (when selecting multi-cpu we always do mpirun iirc)
Thanks for your reply! I will use multi-gpu,Is there a solution?Thanks
What's the config you're trying to use?
I use single node and two nodes, thanks Single node config.yaml: compute_environment: LOCAL_MACHINE debug: false distributed_type: MULTI_GPU downcast_bf16: 'no' enable_cpu_affinity: false gpu_ids: all machine_rank: 0 main_training_function: main mixed_precision: fp16 num_machines: 1 num_processes: 8 rdzv_backend: static same_network: true tpu_env: [] tpu_use_cluster: false tpu_use_sudo: false use_cpu: false
Two node config.multi.yaml: compute_environment: LOCAL_MACHINE debug: false distributed_type: MULTI_GPU downcast_bf16: 'no' enable_cpu_affinity: false gpu_ids: all machine_rank: 0 main_process_ip: xx.xx.xxx.xxx main_process_port: 6000 main_training_function: main mixed_precision: fp16 num_machines: 1 num_processes: 8 rdzv_backend: static same_network: true tpu_env: [] tpu_use_cluster: false tpu_use_sudo: false use_cpu: false
I am also using accelerate and deepSpeed with a ds_config.yaml, but this parameter cannot be configured in the mpirun runtime environment. Single node ds_config.yaml: compute_environment: LOCAL_MACHINE debug: false deepspeed_config: deepspeed_config_file: /deepspeed_config/zs1_config.json zero3_init_flag: false distributed_type: DEEPSPEED downcast_bf16: 'no' enable_cpu_affinity: false machine_rank: 0 main_training_function: main num_machines: 1 num_processes: 8 rdzv_backend: static same_network: true tpu_env: [] tpu_use_cluster: false tpu_use_sudo: false use_cpu: false
I use mpirun train command, : mpirun --allow-run-as-root -np 8 -H xx.xx.xx.xx:8 -x MASTER_ADDR=xx.xx.xx.xx -x MASTER_PORT=1234 -x PATH -bind-to none -map-by slot -mca pml ob1 -mca btl ^openib python train.py
however, i want add the accelerate config.yaml to mpirun, how can I modify this command? eg: mpirun --allow-run-as-root -np 8 -H xx.xx.xx.xx:8 -x MASTER_ADDR=xx.xx.xx.xx -x MASTER_PORT=1234 -x PATH -bind-to none -map-by slot -mca pml ob1 -mca btl ^openib accelerate launch --config_file config.yaml train.py
Not sure on that one as I'm not too familiar with mpirun, however for the first one you can manually pass in a mixed_precision
to the Accelerator()
, for the second you can manually pass in a accelerate.utils.DeepSpeedPlugin
to the Accelerator()
as well.
Not sure on that one as I'm not too familiar with mpirun, however for the first one you can manually pass in a
mixed_precision
to theAccelerator()
, for the second you can manually pass in aaccelerate.utils.DeepSpeedPlugin
to theAccelerator()
as well.
Thanks for your reply! That is to say, it is currently not possible to directly load the accelerate config.yaml using mpirun, correct? I can manually pass accelerate config to the Accelerator(), can you share more information?
Single node config.yaml: compute_environment: LOCAL_MACHINE debug: false distributed_type: MULTI_GPU downcast_bf16: 'no' enable_cpu_affinity: false gpu_ids: all machine_rank: 0 main_training_function: main mixed_precision: fp16 num_machines: 1 num_processes: 8 rdzv_backend: static same_network: true tpu_env: [] tpu_use_cluster: false tpu_use_sudo: false use_cpu: false
Not sure on that one as I'm not too familiar with mpirun, however for the first one you can manually pass in a
mixed_precision
to theAccelerator()
, for the second you can manually pass in aaccelerate.utils.DeepSpeedPlugin
to theAccelerator()
as well.
Can distributed_type and num_processes pass to the Accelerator()? How are these parameters adopted and take effect when submitting tasks with mpirun? Or are these parameters not required to be set by default (except for mixed_precision, gradient_accumulation_steps)? thanks very much!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
Can you give some examples, thanks very much
Expected behavior
Mpirun can load the accelerate config.yaml.