All methods are training-free and rely only on modifying the text embeddings or attention maps.
To install the sd-webui-incantations
extension, follow these steps:
Ensure you have the latest Automatic1111 stable-diffusion-webui version ≥ 1.93 installed
Open the "Extensions" tab and navigate to the "Install from URL" section:
Paste the repository URL into the "URL for extension's git repository" field:
https://github.com/v0xie/sd-webui-incantations.git
Press the Install button: Wait a few seconds for the extension to finish installing.
Restart the Web UI: Completely restart your Stable Diffusion Web UI to load the new extension.
Increases quality of outputs by blurring the self-attention in the middle block layers, with minimal added inference time. Recommended to fix the CFG scale to 3.0, and control the effect using the Blur Sigma value. Increase CFG if the effect is insufficient.
SD XL
Unconditional
Prompt: "a family of teddy bears having a barbecue in their backyard"
https://arxiv.org/abs/2404.05384
Dynamically rescale CFG guidance per semantic region to a uniform level to improve image / text alignment.
Very computationally expensive: A batch size of 4 with 1024x1024 will max out a 24GB 4090.
Prompt: "A cute puppy on the moon", Min Rate: 0.5, Max Rate: 10.0
https://arxiv.org/abs/2403.17377
An alternative/complementary method to CFG (Classifier-Free Guidance) that increases sampling quality.
Implemented a new feature called "Saliency-Adaptive Noise Fusion" derived from "High-fidelity Person-centric Subject-to-Image Synthesis".
This feature combines the guidance from PAG and CFG in an adaptive way that improves image quality especially at higher guidance scales.
Check out the paper authors' project repository here: https://github.com/CodeGoat24/Face-diffuser
Prompt: "a puppy and a kitten on the moon"
SD 1.5
SD XL
https://arxiv.org/abs/2404.07724 and https://arxiv.org/abs/2404.13040
Constrains the usage of CFG to within a specified noise interval. Allows usage of high CFG levels (>15) without drastic alteration of composition.
Adds controllable CFG schedules. For Clamp-Linear, use (c=2.0) for SD1.5 and (c=4.0) for SDXL. For PCS, use (s=1.0) for SD1.5 and (s=0.1) for SDXL.
To use CFG Scheduler, PAG Active must be set True! PAG scale can be set to 0.
Prompt: "A pointillist painting of a raccoon looking at the sea."
Prompt: "An epic lithograph of a handsome salaryman carefully pouring coffee from a cup into an overflowing carafe, 4K, directed by Wong Kar Wai"
The algorithms previously implemented for T2I-Zero were incorrect. They should be working much more stably now. See the previous result in the 'images' folder for an informal comparison between old and new.
Implements Corrections by Similarities and Cross-Token Non-Maximum Suppression from https://arxiv.org/abs/2310.07419
Also implements some methods from "Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models" https://arxiv.org/abs/2403.06381
Reduces the contribution of tokens on far away or conceptually unrelated tokens.
Attempts to reduces the mixing of features of unrelated concepts.
Can error out with image dimensions which are not a multiple of 64
Prompt: "A photo of a lion and a grizzly bear and a tiger in the woods"
SD XL
An incomplete implementation of a "prompt-upsampling" method from https://arxiv.org/abs/2401.06345
Generates an image following the prompt, then uses CLIP text/image similarity to add on to the prompt and generate a new image.
For example, if your prompt is "a blue dog", delimiter is "BREAK", and word replacement is "-", and the level of similarity of the word "blue" in the generated image is below gamma, then the new prompt will be "a blue dog BREAK a - dog"
A WIP implementation of the "prompt optimization" methods are available in branch "s4a-dev2"
SD XL
Improve Stable Diffusion Prompt Following & Image Quality Significantly With Incantations Extension
Characteristic Guidance: Awesome enhancements for sampling at high CFG levels https://github.com/scraed/CharacteristicGuidanceWebUI
A1111-SD-WebUI-DTG: Awesome prompt upsampling method for booru trained anime models https://github.com/KohakuBlueleaf/z-a1111-sd-webui-dtg
CADS: Diversify your generated images https://github.com/v0xie/sd-webui-cads
Semantic Guidance: https://github.com/v0xie/sd-webui-semantic-guidance
Agent Attention: Faster image generation and improved image quality with Agent Attention https://github.com/v0xie/sd-webui-agentattention
The authors of the papers for their methods:
@misc{yu2024seek, title={Seek for Incantations: Towards Accurate Text-to-Image Diffusion Synthesis through Prompt Engineering}, author={Chang Yu and Junran Peng and Xiangyu Zhu and Zhaoxiang Zhang and Qi Tian and Zhen Lei}, year={2024}, eprint={2401.06345}, archivePrefix={arXiv}, primaryClass={cs.CV} }
@misc{tunanyan2023multiconcept, title={Multi-Concept T2I-Zero: Tweaking Only The Text Embeddings and Nothing Else}, author={Hazarapet Tunanyan and Dejia Xu and Shant Navasardyan and Zhangyang Wang and Humphrey Shi}, year={2023}, eprint={2310.07419}, archivePrefix={arXiv}, primaryClass={cs.CV} }
@misc{ahn2024selfrectifying, title={Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance}, author={Donghoon Ahn and Hyoungwon Cho and Jaewon Min and Wooseok Jang and Jungwoo Kim and SeonHwa Kim and Hyun Hee Park and Kyong Hwan Jin and Seungryong Kim}, year={2024}, eprint={2403.17377}, archivePrefix={arXiv}, primaryClass={cs.CV} }
@misc{zhang2024enhancing, title={Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models}, author={Yang Zhang and Teoh Tze Tzun and Lim Wei Hern and Tiviatis Sim and Kenji Kawaguchi}, year={2024}, eprint={2403.06381}, archivePrefix={arXiv}, primaryClass={cs.CV} }
@misc{kynkäänniemi2024applying, title={Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models}, author={Tuomas Kynkäänniemi and Miika Aittala and Tero Karras and Samuli Laine and Timo Aila and Jaakko Lehtinen}, year={2024}, eprint={2404.07724}, archivePrefix={arXiv}, primaryClass={cs.CV} }
@misc{wang2024analysis, title={Analysis of Classifier-Free Guidance Weight Schedulers}, author={Xi Wang and Nicolas Dufour and Nefeli Andreou and Marie-Paule Cani and Victoria Fernandez Abrevaya and David Picard and Vicky Kalogeiton}, year={2024}, eprint={2404.13040}, archivePrefix={arXiv}, primaryClass={cs.CV} }
@misc{shen2024rethinking, title={Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance}, author={Dazhong Shen and Guanglu Song and Zeyue Xue and Fu-Yun Wang and Yu Liu}, year={2024}, eprint={2404.05384}, archivePrefix={arXiv}, primaryClass={cs.CV} }
@misc{wang2024highfidelity, title={High-fidelity Person-centric Subject-to-Image Synthesis}, author={Yibin Wang and Weizhong Zhang and Jianwei Zheng and Cheng Jin}, year={2024}, eprint={2311.10329}, archivePrefix={arXiv}, primaryClass={cs.CV} }
@misc{hong2024smoothedenergyguidanceguiding, title={Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy Curvature of Attention}, author={Susung Hong}, year={2024}, eprint={2408.00760}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2408.00760} } }