kohya-ss / sd-scripts

Apache License 2.0
5.31k stars 880 forks source link

Has SD3 implemented different shifts for different resolutions? #1762

Open gesen2egee opened 2 weeks ago

gesen2egee commented 2 weeks ago

In the SD3 report, section 5.3.2, 'Resolution-dependent shifting of timestep schedules,' it seems to suggest adjusting the timestep shift (SHIFT) based on resolution. (Like flux_shift) image

When at 1024×1024, the shift factor (alpha) is set to 3. However, looking at the code, it seems that the shift is fixed at 3 for all resolutions?

kohya-ss commented 1 week ago

I don't fully understand that part of the SD3 paper, but your suggestion seems correct. However, I'm not good at math, so I don't know where the 3.0 comes from. If sqrt(m/n) is the value of α, then the resolution (H*W) ratio should be 9.

In other words, if the training resolution is H*W, how do you think α should be calculated?