Open gesen2egee opened 2 weeks ago
I don't fully understand that part of the SD3 paper, but your suggestion seems correct. However, I'm not good at math, so I don't know where the 3.0 comes from. If sqrt(m/n)
is the value of α, then the resolution (H*W) ratio should be 9.
In other words, if the training resolution is H*W, how do you think α should be calculated?
In the SD3 report, section 5.3.2, 'Resolution-dependent shifting of timestep schedules,' it seems to suggest adjusting the timestep shift (SHIFT) based on resolution. (Like flux_shift)
When at 1024×1024, the shift factor (alpha) is set to 3. However, looking at the code, it seems that the shift is fixed at 3 for all resolutions?