kohya-ss / sd-scripts

Apache License 2.0
5.07k stars 849 forks source link

how to use default configuration #583

Open hckj588ku opened 1 year ago

hckj588ku commented 1 year ago

I would like to train a model using two or more machines. After setting up the default configuration file using accelerate config, it seems that when I call train_db.py, it is not actually using the configuration I have set.

hckj588ku commented 1 year ago

here is my default config

command_file: null
compute_environment: LOCAL_MACHINE
deepspeed_config: {}
distributed_type: MULTI_GPU
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
gpu_ids: all
machine_rank: 0
main_process_ip: 180.184.103.46
main_process_port: 19220
main_training_function: main
megatron_lm_config: {}
mixed_precision: fp16
num_machines: 2
num_processes: 2
rdzv_backend: static
same_network: false
tpu_name: null
tpu_zone: null
use_cpu: false
commands: null

here is my command

accelerate launch ${launchArgs[@]} --num_cpu_threads_per_process=8 "./train_db.py" \
  --enable_bucket \
  --pretrained_model_name_or_path=$pretrained_model \
  --train_data_dir=$train_data_dir \
  --reg_data_dir=$reg_data_dir \
  --dataset_config="./config.toml" \
  --output_dir="./output" \
  --output_name=$output_name \
  --max_train_steps=$max_train_steps \
  --logging_dir="./logs" \
  --log_prefix=$output_name \
  --resolution=$resolution \
  --max_train_epochs=$max_train_epoches \
  --learning_rate=$lr \
  --lr_scheduler=$lr_scheduler \
  --lr_warmup_steps=$lr_warmup_steps \
  --lr_scheduler_num_cycles=$lr_restart_cycles \
  --train_batch_size=$batch_size \
  --save_every_n_epochs=$save_every_n_epochs \
  --save_precision="float" \
  --seed="1337" \
  --cache_latents \
  --gradient_checkpointing \
  --prior_loss_weight=1 \
  --max_token_length=225 \
  --caption_extension=".txt" \
  --save_model_as=$save_model_as \
  --min_bucket_reso=$min_bucket_reso \
  --max_bucket_reso=$max_bucket_reso \
  --keep_tokens=$keep_tokens \
  --xformers --shuffle_caption ${extArgs[@]}
kohya-ss commented 1 year ago

The default config will be placed at HOME/.cache/huggingface/accelerate/default_config.yaml, and acceleate will load it. If you have the file, I'm not sure why the default config is not used.

However, I think you can specify the config file with --config_file option for accelerate launch.