Open sashasubbbb opened 1 year ago
I think it might be accelerate issue. Do you have the same issue in fine_tune.py?
I think one of the current options is to use --dataset_repeats to make larger the epoch. Please try to use this for now.
Yeah, this solution worked ok but without --cache_latents on. When i use caching, it seems to ignore dataset_repeats flag. Is there any other way to set multiple repeats of dataset training to single epoch? Right now because of these pauses between epochs, training time increases up to 2 times
Edit: figured it out, you can rename your dataset concept folder to #_concept, and epochs to 1. training will be repeated # times without switchings epochs
The dataset_repeats option only works for DreamBooth method (without metadata .json). If you use the option with metadata json, there might be a bug for handling cache_latents or dataset_repeats. I'm working on the refactoring, the bug will be solved with the refactoring.
If the repeating # of the folder name works, it will be good!
I've added ``--max_data_loader_n_workers" option in #72. Small number of the workers might reduce the pausing between epochs (default is 8).
For some reason there is a large delay when epochs change, making training much slower, what could cause this? My settings:
accelerate launch --num_cpu_threads_per_process 10 train_network.py --pretrained_model_name_or_path=B:\AIimages\stable-diffusion-webui\models\Stable-diffusion\model.ckpt --train_data_dir=B:\AIimages\training\data --output_dir=B:\train\out\ --in_json=B:\AIimages\training\data\meta_lat.json --resolution=512,512 --prior_loss_weight=1.0 --train_batch_size=4 --learning_rate=1e-3 --max_train_steps=15000 --use_8bit_adam --xformers --gradient_checkpointing --mixed_precision=fp16 --save_every_n_epochs=10 --network_module=networks.lora --shuffle_caption --unet_lr=3e-4 --text_encoder_lr=3e-5 --lr_scheduler=constant --save_model_as=safetensors --seed=115