kohya-ss / sd-scripts

Apache License 2.0
5.26k stars 874 forks source link

Low loss but bad results, what am I doing wrong? #163

Open gigadeplex opened 1 year ago

gigadeplex commented 1 year ago

After training TI for 1500 steps, I can get down to a loss of about 0.05, much better than the previous 0.1. However, the results are still bad. Very bad. Here is what the training input params looks like:

/home/ubuntu/anaconda3/envs/pt13/bin/accelerate launch --num_cpu_threads_per_process 2 train_textual_inversion.py \
    --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
    --train_data_dir="/home/ubuntu/sd-scripts/training_images/train_person/" \
    --output_dir="./models" \
    --resolution=512 \
    --train_batch_size=1 \
    --learning_rate=1e-4 \
    --max_train_steps=1500 \
    --save_every_n_epochs=1 \
    --save_model_as="safetensors" \
    --clip_skip=2 \
    --seed=42 \
    --color_aug  \
    --use_8bit_adam \
    --lr_scheduler="cosine" \
    --use_object_template  \
    --token_string="qwerty" \
    --init_word="*" \
    --num_vectors_per_token=8

Moreover, here is the last recorded training log:

steps: 100%|███████████████████████████████████████████████████████████████| 1500/1500 [09:24<00:00, 2.66it/s, loss=0.0542]

rockerBOO commented 1 year ago

with TI you want flexibility to work with other parts of the prompt so hitting very low loss is not as ideal. Average loss less than 0.3 is ideal. I generally hit around 0.15. Check out the tensorboard integration to see the average loss more easily.

kohya-ss commented 1 year ago

Too low loss often seems to indicate overfitting. It may be a good idea to try reducing the number of steps or lower learning rate.

AI-Casanova commented 1 year ago

It is also dependant on the steepness of the local minimum. You want to be in a fairly deep wide hole, so you can wander around on fairly level ground, not down at the bottom of a well.

(At least one paper demonstrates that too high of a batch size can actually hurt in this regard)