kohya-ss / sd-scripts

Apache License 2.0
5.19k stars 862 forks source link

How is num_epochs calculated? #366

Open specblades opened 1 year ago

specblades commented 1 year ago

Cant understand, how is num_epochs calculated. I use, for example, 16 images, 1 repeat, 5 epochs, 1 batch size, 16 grad accum steps. And i get 40 epoch. Why batches per epoch is 32? What? 😢 Help me out pls

image
kgonia commented 1 year ago

what value for max_train_steps did you use?

specblades commented 1 year ago

@kgonia there is no max_train_steps in bmaltais repo. He doesn't know how to calculate correctly either, and refers to kohya's internal code

kohya-ss commented 1 year ago

The number of batches per epoch is 32 in this case. Because the number of images to train is 32 (16 train images and 16 reg images.)

However, the number of actual update steps per epoch is 2. Because the gradient accumulation steps is 16, the update (back propagation) per epoch is 32/16=2.

Therefore, total optimization steps will be 10, because it is calculated by max_train_epochs * update_steps...

specblades commented 1 year ago

The number of batches per epoch is 32 in this case. Because the number of images to train is 32 (16 train images and 16 reg images.)

However, the number of actual update steps per epoch is 2. Because the gradient accumulation steps is 16, the update (back propagation) per epoch is 32/16=2.

Therefore, total optimization steps will be 10, because it is calculated by max_train_epochs * update_steps...

But I uploaded 28 reg images. Does that mean the maximum reg image equals the train images?

kohya-ss commented 1 year ago

Yes, the reg images are repeated until the number of the training images, to balance the numbers.

specblades commented 1 year ago

Yes, the reg images are repeated until the number of the training images, to balance the numbers.

Will only the first "N=train images" images be utilized from the entire reg folder? Or will all the images be used sequentially if there are more than N?

kohya-ss commented 1 year ago

The first N (num of train images * num of repeats) images are utilized from the reg folder, so please set the number of repeats of the training dataset according to the num of reg images.