[Feature]: Enhance inpaint with full noise when desired

aleksusklim commented 3 months ago

Issue Description

I've just installed SD.Next and cannot find options for inpainting mask behavior: under the mask section I see only Blur, Padding, Mode: masked/inverted and Inpaint area: full/masked.

No matter what "theme" would I choose! Where are those settings, why I cannot see them? It does not look like in the Readme docs for me:

I tried both dev and master branches.

Version Platform Description

Windows 10 x64 21H2, Chrome 118, NVidia RTX 3060, Intel Core i7-12700K, Python 3.10.10, SD.Next 2024-03-13

Relevant log output

No response

Backend

Original

Branch

Master

Model

SD-XL

Acknowledgements

[X] I have read the above and searched for existing issues
[X] I confirm that this is classified correctly and its not an extension issue

brknsoul commented 3 months ago

Do you mean these options, taken from A1111?

Those aren't really required, as inpainting is typically a method to change the original image, so "original" is always set. For more extreme changes from the image you can set the denoise higher, or set it to 1.0 to ignore the source.

vladmandic commented 3 months ago

btw, you said you tried dev and master, but then also said thar your sdnext is from march??

aleksusklim commented 3 months ago

you said you tried dev and master, but then also said that your sdnext is from march??

Hmm, strange, sorry. Maybe I have mistaken somewhere, I'll double check on what exact commit I am. I will attach the full log from webui.bat --debug

Those aren't really required

So are they there or not?

set it to 1.0 to ignore the source.

Absolutely false! Try this:

Get an image on while background, like this:

Mask the whole image (just draw the mask over every pixel). Then inpaint as original at maximal denoising with a prompt suggesting a vivid background ("a man at soccer stadium" in my case):

Repeat with latent noise and you'd get what expected:

FYI, this is what fill would do (producing a completely white canvas to draw onto):

P.S. Do not recommend "just do img2img" because this is what I'll get:

vladmandic commented 3 months ago

which EXACT things are you missing? you listed what you see in sdnext, not what's missing. and don't compare to a1111, this is no longer a fork of a1111 and implementation is quite different. so if you state clearly what is the functionality required, then we can move forward. from what i gather, you want latent noise fill?

aleksusklim commented 3 months ago

Yes, thanks, here are new logs:

C:\NN\SD\SDNext\automatic>git checkout dev
Switched to a new branch 'dev'
branch 'dev' set up to track 'origin/dev'.

C:\NN\SD\SDNext\automatic>git reset --hard origin/dev
HEAD is now at a430acbb Merge pull request #3283 from vladmandic/master

C:\NN\SD\SDNext\automatic>git pull origin dev
From https://github.com/vladmandic/automatic
 * branch              dev        -> FETCH_HEAD
Already up to date.

C:\NN\SD\SDNext\automatic>git status
On branch dev
Your branch is up to date with 'origin/dev'.

nothing to commit, working tree clean

C:\NN\SD\SDNext\automatic>webui.bat --debug
Using VENV: C:\NN\SD\SDNext\automatic\venv
22:25:52-992820 INFO     Starting SD.Next
22:25:52-994819 INFO     Logger: file="C:\NN\SD\SDNext\automatic\sdnext.log" level=DEBUG size=65 mode=create
22:25:52-995820 INFO     Python version=3.10.10 platform=Windows bin="C:\NN\SD\SDNext\automatic\venv\Scripts\python.exe"
                         venv="C:\NN\SD\SDNext\automatic\venv"
22:25:53-067766 INFO     Version: app=sd.next updated=2024-06-24 hash=a430acbb branch=dev
                         url=https://github.com/vladmandic/automatic/tree/dev ui=dev
22:25:53-155858 DEBUG    Branch sync failed: sdnext=dev ui=dev
22:25:53-769832 INFO     Latest published version: 65823a401613aebea58184aba5f7d5edaf2fef06 2024-06-24T05:41:10Z
22:25:53-774906 INFO     Platform: arch=AMD64 cpu=Intel64 Family 6 Model 151 Stepping 2, GenuineIntel system=Windows
                         release=Windows-10-10.0.19044-SP0 python=3.10.10
22:25:53-776906 DEBUG    Setting environment tuning
22:25:53-777906 INFO     HF cache folder: C:\Users\User\.cache\huggingface\hub
22:25:53-777906 DEBUG    Torch allocator: "garbage_collection_threshold:0.80,max_split_size_mb:512"
22:25:53-778906 DEBUG    Torch overrides: cuda=False rocm=False ipex=False diml=False openvino=False
22:25:53-780906 DEBUG    Torch allowed: cuda=True rocm=True ipex=True diml=True openvino=True
22:25:53-783620 INFO     nVidia CUDA toolkit detected: nvidia-smi present
22:25:53-831032 INFO     Verifying requirements
22:25:53-833109 WARNING  Package version mismatch: diffusers 0.29.0 required 0.29.1
22:25:53-834109 INFO     Install: package="diffusers==0.29.1"
22:25:53-835109 DEBUG    Running: pip="install --upgrade diffusers==0.29.1 "
22:25:56-693481 WARNING  Package version mismatch: urllib3 1.26.18 required 1.26.19
22:25:56-695482 INFO     Install: package="urllib3==1.26.19"
22:25:56-696483 DEBUG    Running: pip="install --upgrade urllib3==1.26.19 "
22:25:58-446480 INFO     Verifying packages
22:25:58-464480 DEBUG    Repository update time: Mon Jun 24 18:14:08 2024
22:25:58-465480 INFO     Startup: standard
22:25:58-466479 INFO     Verifying submodules
22:26:00-910476 DEBUG    Git detached head detected: folder="extensions-builtin/sd-extension-chainner" reattach=main
22:26:00-912476 DEBUG    Submodule: extensions-builtin/sd-extension-chainner / main
22:26:00-973489 DEBUG    Git detached head detected: folder="extensions-builtin/sd-extension-system-info" reattach=main
22:26:00-974476 DEBUG    Submodule: extensions-builtin/sd-extension-system-info / main
22:26:01-031476 DEBUG    Git detached head detected: folder="extensions-builtin/sd-webui-agent-scheduler" reattach=main
22:26:01-032476 DEBUG    Submodule: extensions-builtin/sd-webui-agent-scheduler / main
22:26:01-095476 DEBUG    Git detached head detected: folder="extensions-builtin/sdnext-modernui" reattach=main
22:26:01-097476 DEBUG    Submodule: extensions-builtin/sdnext-modernui / main
22:26:01-157476 DEBUG    Git detached head detected: folder="extensions-builtin/stable-diffusion-webui-rembg" reattach=master
22:26:01-158476 DEBUG    Submodule: extensions-builtin/stable-diffusion-webui-rembg / master
22:26:01-216476 DEBUG    Git detached head detected: folder="modules/k-diffusion" reattach=master
22:26:01-217476 DEBUG    Submodule: modules/k-diffusion / master
22:26:01-272477 DEBUG    Git detached head detected: folder="wiki" reattach=master
22:26:01-274477 DEBUG    Submodule: wiki / master
22:26:01-299478 DEBUG    Register paths
22:26:01-328478 DEBUG    Installed packages: 186
22:26:01-330478 DEBUG    Extensions all: ['Lora', 'sd-extension-chainner', 'sd-extension-system-info',
                         'sd-webui-agent-scheduler', 'sdnext-modernui', 'stable-diffusion-webui-rembg']
22:26:01-427479 DEBUG    Running extension installer:
                         C:\NN\SD\SDNext\automatic\extensions-builtin\sd-webui-agent-scheduler\install.py
22:26:01-608477 DEBUG    Running extension installer:
                         C:\NN\SD\SDNext\automatic\extensions-builtin\stable-diffusion-webui-rembg\install.py
22:26:01-759477 DEBUG    Extensions all: []
22:26:01-761477 INFO     Extensions enabled: ['Lora', 'sd-extension-chainner', 'sd-extension-system-info',
                         'sd-webui-agent-scheduler', 'sdnext-modernui', 'stable-diffusion-webui-rembg']
22:26:01-762478 INFO     Verifying requirements
22:26:01-764477 DEBUG    Setup complete without errors: 1719249962
22:26:01-767478 DEBUG    Extension preload: {'extensions-builtin': 0.0, 'extensions': 0.0}
22:26:01-768477 DEBUG    Starting module: <module 'webui' from 'C:\\NN\\SD\\SDNext\\automatic\\webui.py'>
22:26:01-770477 INFO     Command line args: ['--debug'] debug=True
22:26:01-772476 DEBUG    Env flags: []
22:26:08-134257 INFO     Load packages: {'torch': '2.3.1+cu121', 'diffusers': '0.29.1', 'gradio': '3.43.2'}
22:26:08-679257 DEBUG    Reading failed: config.json [Errno 2] No such file or directory: 'config.json'
22:26:08-833257 INFO     VRAM: Detected=12.0 GB Optimization=none
22:26:08-835257 DEBUG    Created default config: config.json
22:26:08-837257 INFO     Engine: backend=Backend.DIFFUSERS compute=cuda device=cuda attention="Scaled-Dot-Product" mode=no_grad
22:26:08-838258 DEBUG    Save: file="config.json" json=14 bytes=589 time=0.001
22:26:08-899256 INFO     Device: device=NVIDIA GeForce RTX 3060 n=1 arch=sm_90 cap=(8, 6) cuda=12.1 cudnn=8907 driver=546.01
22:26:08-901257 DEBUG    Migrated styles: file=styles.csv folder=models\styles
22:26:08-909257 DEBUG    Load styles: folder="models\styles" items=288 time=0.01
22:26:08-911257 DEBUG    Read: file="html\reference.json" json=43 bytes=24684 time=0.000
22:26:09-327256 DEBUG    ONNX: version=1.18.0 provider=CUDAExecutionProvider, available=['AzureExecutionProvider',
                         'CPUExecutionProvider']
22:26:09-417256 DEBUG    Importing LDM
22:26:09-452256 DEBUG    Entering start sequence
22:26:09-454256 INFO     Create: folder="models\ONNX"
22:26:09-456258 INFO     Create: folder="models\Diffusers"
22:26:09-457256 INFO     Create: folder="models\VAE"
22:26:09-458256 INFO     Create: folder="models\UNET"
22:26:09-459256 INFO     Create: folder="models\Lora"
22:26:09-460256 INFO     Create: folder="models\embeddings"
22:26:09-461256 INFO     Create: folder="models\hypernetworks"
22:26:09-462256 INFO     Create: folder="outputs\text"
22:26:09-463256 INFO     Create: folder="outputs\image"
22:26:09-463256 INFO     Create: folder="outputs\control"
22:26:09-464256 INFO     Create: folder="outputs\extras"
22:26:09-465256 INFO     Create: folder="outputs\init-images"
22:26:09-468256 INFO     Create: folder="outputs\grids"
22:26:09-469256 INFO     Create: folder="outputs\save"
22:26:09-470256 INFO     Create: folder="outputs\video"
22:26:09-471256 INFO     Create: folder="models\wildcards"
22:26:09-472256 DEBUG    Initializing
22:26:09-495256 INFO     Available VAEs: path="models\VAE" items=0
22:26:09-497256 DEBUG    Available UNets: path="models\UNET" items=0
22:26:09-499256 INFO     Disabled extensions: ['sdnext-modernui']
22:26:09-504256 DEBUG    Scanning diffusers cache: folder=models\Diffusers items=0 time=0.00
22:26:09-506256 INFO     Available models: path="models\Stable-diffusion" items=1 time=0.01
22:26:09-773255 DEBUG    Load extensions
22:26:09-865255 INFO     Extension: script='extensions-builtin\Lora\scripts\lora_script.py' 22:26:09-859255
                         INFO     LoRA networks: available=0 folders=2
22:26:10-558254 INFO     Extension: script='extensions-builtin\sd-webui-agent-scheduler\scripts\task_scheduler.py' Using sqlite
                         file: extensions-builtin\sd-webui-agent-scheduler\task_scheduler.sqlite3
22:26:10-579254 DEBUG    Extensions init time: 0.80 sd-extension-chainner=0.08 sd-webui-agent-scheduler=0.61
22:26:10-616255 DEBUG    Read: file="html/upscalers.json" json=4 bytes=2640 time=0.000
22:26:10-618254 INFO     Upscaler create: folder="models\chaiNNer"
22:26:10-619254 DEBUG    Read: file="extensions-builtin\sd-extension-chainner\models.json" json=24 bytes=2693 time=0.000
22:26:10-621254 DEBUG    chaiNNer models: path="models\chaiNNer" defined=24 discovered=0 downloaded=0
22:26:10-622254 INFO     Upscaler create: folder="models\RealESRGAN"
22:26:10-624254 DEBUG    Load upscalers: total=52 downloaded=0 user=0 time=0.04 ['None', 'Lanczos', 'Nearest', 'ChaiNNer',
                         'ESRGAN', 'LDSR', 'RealESRGAN', 'SCUNet', 'SD', 'SwinIR']
22:26:10-633254 DEBUG    Load styles: folder="models\styles" items=288 time=0.01
22:26:10-639254 DEBUG    Creating UI
22:26:10-640255 DEBUG    UI themes available: type=Standard themes=12
22:26:10-642254 INFO     UI theme: type=Standard name="black-teal"
22:26:10-645254 DEBUG    UI theme: css="C:\NN\SD\SDNext\automatic\javascript\black-teal.css" base="sdnext.css" user="None"
22:26:10-649254 DEBUG    UI initialize: txt2img
22:26:10-667254 DEBUG    Networks: page='model' items=43 subfolders=2 tab=txt2img folders=['models\\Stable-diffusion',
                         'models\\Diffusers', 'models\\Reference'] list=0.00 thumb=0.00 desc=0.00 info=0.00 workers=4
                         sort=Default
22:26:10-671254 DEBUG    Networks: page='lora' items=0 subfolders=0 tab=txt2img folders=['models\\Lora', 'models\\LyCORIS']
                         list=0.00 thumb=0.00 desc=0.00 info=0.00 workers=4 sort=Default
22:26:10-684254 DEBUG    Networks: page='style' items=288 subfolders=1 tab=txt2img folders=['models\\styles', 'html'] list=0.01
                         thumb=0.00 desc=0.00 info=0.00 workers=4 sort=Default
22:26:10-689254 DEBUG    Networks: page='embedding' items=0 subfolders=0 tab=txt2img folders=['models\\embeddings'] list=0.00
                         thumb=0.00 desc=0.00 info=0.00 workers=4 sort=Default
22:26:10-692254 DEBUG    Networks: page='vae' items=0 subfolders=0 tab=txt2img folders=['models\\VAE'] list=0.00 thumb=0.00
                         desc=0.00 info=0.00 workers=4 sort=Default
22:26:10-756254 DEBUG    UI initialize: img2img
22:26:10-837254 DEBUG    UI initialize: control models=models\control
22:26:11-185254 DEBUG    Reading failed: ui-config.json [Errno 2] No such file or directory: 'ui-config.json'
22:26:11-256254 DEBUG    UI themes available: type=Standard themes=12
22:26:11-352253 DEBUG    Reading failed: C:\NN\SD\SDNext\automatic\html\extensions.json [Errno 2] No such file or directory:
                         'C:\\NN\\SD\\SDNext\\automatic\\html\\extensions.json'
22:26:11-354253 INFO     Extension list is empty: refresh required
22:26:12-006253 DEBUG    Extension list: processed=6 installed=6 enabled=5 disabled=1 visible=6 hidden=0
22:26:12-069252 DEBUG    Save: file="ui-config.json" json=0 bytes=2 time=0.001
22:26:12-106252 DEBUG    Root paths: ['C:\\NN\\SD\\SDNext\\automatic']
22:26:12-176254 INFO     Local URL: http://127.0.0.1:7860/
22:26:12-178252 DEBUG    Gradio functions: registered=1704
22:26:12-179252 DEBUG    FastAPI middleware: ['Middleware', 'Middleware']
22:26:12-182252 DEBUG    Creating API
22:26:12-296252 INFO     [AgentScheduler] Task queue is empty
22:26:12-298252 INFO     [AgentScheduler] Registering APIs
22:26:12-380252 DEBUG    Scripts setup: ['IP Adapters:0.017', 'AnimateDiff:0.008', 'X/Y/Z Grid:0.009', 'Face:0.01',
                         'Image-to-Video:0.005']
22:26:12-382252 DEBUG    Save: file="metadata.json" json=1 bytes=320 time=0.000
22:26:12-384252 INFO     Model metadata saved: file="metadata.json" items=1 time=0.00
22:26:12-385252 DEBUG    Torch mode: deterministic=False
22:26:12-469317 DEBUG    Desired Torch parameters: dtype=FP16 no-half=False no-half-vae=False upscast=False
22:26:12-471317 INFO     Setting Torch parameters: device=cuda dtype=torch.float16 vae=torch.float16 unet=torch.float16
                         context=no_grad fp16=True bf16=None optimization=Scaled-Dot-Product
22:26:12-474317 DEBUG    Model requested: fn=<lambda>
22:26:12-475326 INFO     Selecting first available checkpoint
22:26:12-477317 DEBUG    Script callback init time: task_scheduler.py:app_started=0.09
22:26:12-478316 DEBUG    Save: file="config.json" json=28 bytes=1133 time=0.001
22:26:12-479316 INFO     Startup time: 10.70 torch=4.88 gradio=0.98 diffusers=0.49 libraries=1.28 extensions=0.80
                         face-restore=0.27 ui-en=0.13 ui-control=0.07 ui-models=0.18 ui-settings=0.16 ui-extensions=0.69
                         launch=0.11 api=0.06 app-started=0.14 checkpoint=0.10
22:26:40-335932 INFO     MOTD: N/A
22:26:40-792932 DEBUG    UI themes available: type=Standard themes=12
22:26:40-885932 INFO     Browser session: user=None client=127.0.0.1 agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64)
                         AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36

Basically, after that:

– So, is that correct? …Wait, initially I thought that I am missing some options from https://github.com/vladmandic/automatic/blob/master/html/screenshot-mask.jpg, but now a realized they are for "Control" mode and not for "Image" mode. I see those settings, but again – "masked content" dropdown is not there.

this is no longer a fork of a1111 and implementation is quite different. from what i gather, you want latent noise fill?

So you no longer have the masked content options? (I mean, I've searched older Issues and clearly saw references to "masked content" to be at least existing… But if you kinda changed the engine then it might be as well left out, is that so?)

aleksusklim commented 3 months ago

Also, if you wanna know how I ended up here in the first place: it's because I couldn't do Differential Diffusion: Giving Each Pixel its strength in WebUI Forge by trying their Soft Inpainting, nor with ControlNet Inpaint by using "Fooocus Inpaint" model (inpaint_v26.fooocus.patch) Nothing from this gave me what I want (or at least what I think I want).

But then I found this comment: https://github.com/exx8/differential-diffusion/issues/9#issuecomment-1961910112

Can you make it into a WEBUI plug-in […] ↓ And magnificent @vladmandic has already released it in

So, umm… where? Was it lost too?

Because at very default:

– Nope, nothing like in https://differential-diffusion.github.io/

What am I missing?

vladmandic commented 3 months ago

(I mean, I've searched older Issues and clearly saw references to "masked content" to be at least existing… But if you kinda changed the engine then it might be as well left out, is that so?)

again, please just write what do you want to accomplish in simple terms, don't state what is here or missing or compare. no need for logs or screenshots - explain.

regarding differntial diffusion - its fully implemented in sdnext, did you actually enable it? because it uses a different field to upload a mask (or it can auto-create one), not what i see on your screenshot.

aleksusklim commented 3 months ago

differential diffusion - its fully implemented in sdnext

Ah, it's in the scripts list! Ooh, I got it. Well, I've tried it and it definitely works:

CinematicRedmond-blue water pool

(Init image and mask are from my previous screenshot). I see this is a completely separated pipeline. It works rather slow, and also uses more than 12 Gb of VRAM so swaps to shared. But anyway it is working: it gives different "strength" to the mask according to its color value, with black meaning "new stuff" and white meaning "original stuff", blended neatly in gray.

But what if I'll try a completely black mask? (Init image is from my older post here)

CinematicRedmond-a man at soccer stadium

Alas, this is the same as in pure img2img with maximal denoising… And not what "masked content = latent noise" would do.

please just write what do you want to accomplish in simple terms

I think now I know what I am looking for. Not sure whether it is implemented anywhere, maybe you know? So, basically…

strength (`float`, *optional*, defaults to 0.3):
    Conceptually, indicates how much to transform the reference `image`. Must be between 0 and 1. `image`
    will be used as a starting point, adding more noise to it the larger the `strength`. The number of
    denoising steps depends on the amount of noise initially added. When `strength` is 1, added noise will
    be maximum and the denoising process will run for the full number of iterations specified in
    `num_inference_steps`. A value of 1, therefore, essentially ignores `image`. Note that in the case of
    `denoising_start` being declared as an integer, the value of `strength` will be ignored.

When strength is 1, added noise will be maximum A value of 1, therefore, essentially ignores image

– This is a lie. Strength 1.0 for img2img is NOT equal to pure txt2img as one might think. I presume you cannot just swap places between image and the noise: the formula of the scheduler adds a certain amount of noise to the original at each timestep, not the other way around. At first I was thinking of something like latents = image*(1-W) + noise*W where W is A new denoising strength that at 0 would give the image but at 1 return the pure noise; but I doubt the denoising process is that simple to change.

Basically, what I want:

Imagine a person posing against the pure white background.
I want to create an image based on that pose, but without ControlNet or any other hacks.
I need the background to be GENERATED, from scratch. Just as txt2img would do.
I need the person to be "slightly changed", as in img2img with controllable strength.
It should be done in one go, so that light and shadows would be depicted faithfully, not photoshopped post factum.
I can draw the mask for the person myself, for example making the background solid black (meaning "ignore this and give me a new one") and outlining the body in gray (meaning "take this into account"); maybe leaving the face pure white if I would want to preserve the personality (meaning "do not change those pixels").
Where do I put the image + mask to get the desired result?

Actually, "Masked content" option would not help me here either, because in A1111 it is binary: use it with low denoising strength and it would spit RGB noise right on your image (since it filled everything with a pure noise and you asked it to "change this slightly").

Now! Taking into account that img2img pays a lot of attention to the original image, I propose a new feature that will essentially make continuous transition from "masked content = original" and "masked content = latent noise". (FYI, "masked content = fill" can be simulated externally by running something like lama-cleaner against the object to erase it; while "masked content = latent nothing" needs further testing, whether it is essentially the same as just using pure gray image or not).

I propose a feature "noise mask for img2img". It works like this:

It is an image loading / drawing component, just as inpaiting mask (or like that "differential diffusion" mask).
By default it is either threated as completely white, or can automatically inherit the main inpaiting mask (completely black if not present / not in inpaiting) – this should be a choice, replacing that "masked content" option.
There is also a "noise mask seed" field, which at "-1" is random and at other negatives would inherit the main seed value (after it is resolved to positive number).
User should paint with black/dark those areas that he wants to be RANDOM ("latent noise"), and leave white/bright areas that should be based on his provided img2img image ("original").
Accordingly, if this mask is inherited from inpaiting – whatever is masked is white, and everything else is threated as black for "latent noise" mode.
Before the main processing, the input image is converted to latents and mixed with a new noise by roughly this formula:

latents = M*latents + (1-M)*create_noise_like(latents,mask_seed) – Where M is the spatial mask in 0.0-1.0 range, downscaled to latent resolution.

This way it would be possible to:

Have control over "masked content" between "original" and "latent noise"
Use masking anywhere in img2img, not only for inpaiting (and my images of a soccer player should give you an idea of how that can be used)
Observe effects similar to differential diffusion, that can be mixed with other mechanisms because my method only affects the initial image once at the beginning.
Slightly variate results by rolling the mask noise seed but not the main one.

The only question is the denoising strength, which is threating to be unusable on anything except 0.99, but that is easier to test than to assume.

What do you think? Is there any existing mechanism anywhere that can directly blend the image with noise by a mask? Maybe I could write an extension for A1111 myself…

If you think "latent noise" is a valuable option and if you don't have it now – then I encourage you to implement it spatially as I explained, it would be more powerful than in A1111 now!

vladmandic commented 3 months ago

ok, now we're getting somewhere. first, thanks for confirming differential diffusion works as intented. second, your use case is valid. and no, i don't see an obvious way to accomplish it today with existing tools. just how to implement it so its smooth/clear, thats another story - but its something i can think about.

i'm convering this issue into a feature request.

vladmandic / automatic