Closed heart-du closed 1 week ago
The paper says that the RMS-Norm for Q and K can be added to stabilize training runs. And, they observed the instability caused by not normalizing Q and K not fully but partly -at the last transformer blocks of the network. Maybe for this reason, it wasn't added. AFAIU this might not be a compulsory thing to use but might be optional. See the paper for details. Additionally, Phil Wang preferred qk_rmsnorm = False
by default too. Cc: @DN6 @yiyixuxu
Btw, the code you shared is not the last version of it; see it in the repo.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Thanks for addressing @tolgacangoz! I think this can now be marked closed, but feel free to re-open @heart-du if something needs addressing
Describe the bug
there is no qk_norm in SD3Transformer2DModel. Is that right?
Reproduction
1.
Logs
No response
System Info
29.2
Who can help?
dukunpeng