Open hckj588ku opened 1 year ago
here is my default config
command_file: null
compute_environment: LOCAL_MACHINE
deepspeed_config: {}
distributed_type: MULTI_GPU
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
gpu_ids: all
machine_rank: 0
main_process_ip: 180.184.103.46
main_process_port: 19220
main_training_function: main
megatron_lm_config: {}
mixed_precision: fp16
num_machines: 2
num_processes: 2
rdzv_backend: static
same_network: false
tpu_name: null
tpu_zone: null
use_cpu: false
commands: null
here is my command
accelerate launch ${launchArgs[@]} --num_cpu_threads_per_process=8 "./train_db.py" \
--enable_bucket \
--pretrained_model_name_or_path=$pretrained_model \
--train_data_dir=$train_data_dir \
--reg_data_dir=$reg_data_dir \
--dataset_config="./config.toml" \
--output_dir="./output" \
--output_name=$output_name \
--max_train_steps=$max_train_steps \
--logging_dir="./logs" \
--log_prefix=$output_name \
--resolution=$resolution \
--max_train_epochs=$max_train_epoches \
--learning_rate=$lr \
--lr_scheduler=$lr_scheduler \
--lr_warmup_steps=$lr_warmup_steps \
--lr_scheduler_num_cycles=$lr_restart_cycles \
--train_batch_size=$batch_size \
--save_every_n_epochs=$save_every_n_epochs \
--save_precision="float" \
--seed="1337" \
--cache_latents \
--gradient_checkpointing \
--prior_loss_weight=1 \
--max_token_length=225 \
--caption_extension=".txt" \
--save_model_as=$save_model_as \
--min_bucket_reso=$min_bucket_reso \
--max_bucket_reso=$max_bucket_reso \
--keep_tokens=$keep_tokens \
--xformers --shuffle_caption ${extArgs[@]}
The default config will be placed at HOME/.cache/huggingface/accelerate/default_config.yaml
, and acceleate
will load it. If you have the file, I'm not sure why the default config is not used.
However, I think you can specify the config file with --config_file
option for accelerate launch
.
I would like to train a model using two or more machines. After setting up the default configuration file using accelerate config, it seems that when I call train_db.py, it is not actually using the configuration I have set.