vladmandic / automatic

SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models
https://github.com/vladmandic/automatic
GNU Affero General Public License v3.0
5.49k stars 400 forks source link

[Issue]: API unexpected output using masked img2img #2615

Closed DevilsCrypto closed 9 months ago

DevilsCrypto commented 9 months ago

Issue Description

After some tests i get blurry or grey output from the api, while the webui with the same settings is just fine. I have tried debugging this issue, but i haven't found any cause. Both the api and the webui are using the same pipeline, the only difference i can see is that the webui inserts 'extra_generation_params'.

 'extra_generation_params': {'Mask alpha': 0,
                             'Mask area': 1,
                             'Mask blur': 0,
                             'Mask content': 0,
                             'Mask invert': 0,
                             'Mask padding': 32},

I am using the same sampler, steps, model, etc on both the webui and the api. The exact same image and mask are being used. Still different output, while i would expect there would be no difference using the api against the webui if the same settings are used.

Would be great if i can get some help finding whats causing this issue.

So what i am using as a test: Model: Realistic Vision 5.1 Inpaintinghttps://civitai.com/models/4201?modelVersionId=130090

Prompt:

masterpiece, best quality, beautiful mountains , 8k uhd, dslr, soft lighting, high quality, film grain, Fujifilm XT3 

Negative prompt:

(deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime), text, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck, UnrealisticDream

I've added the params for easier copy of settings: params.txt

Api inputs for /sdapi/v1/img2img:

        {
            "prompt": inpaint_prompt,
            "negative_prompt": negative_prompt,
            "steps": "30",
            "sampler": "Euler a",
            "sampler_name": "Euler a",
            "resize_mode": 0,
            "scale_by": 1,
            "image_cfg_scale": 1.5,
            "cfg_scale": 6,
            "inpaint_full_res": 1,
            "inpaint_full_res_padding": 32,
            "mask_blur": 0,
            "mask": encoded_mask,
            "init_images": [encoded_image],
            "save_images": True,
        }

Here the example image:

Original image: original

Mask: mask

API output: api_result

Webui output: webui_result

Version Platform Description

06:16:31-652185 INFO     Starting SD.Next                                                                                                                                                                
06:16:31-655013 INFO     Logger: file="/home/ubuntu/automatic/sdnext.log" level=DEBUG size=64 mode=create                                                                                         
06:16:31-655964 INFO     Python 3.10.13 on Linux                                                                                                                                                         
06:16:31-663138 INFO     Version: app=sd.next updated=2023-12-04 hash=93f35ccf url=https://github.com/vladmandic/automatic.git/tree/master                                                               
06:16:31-897816 INFO     Platform: arch=x86_64 cpu=x86_64 system=Linux release=5.15.0-84-generic python=3.10.13                                                                                          
06:16:31-898838 DEBUG    Setting environment tuning                                                                                                                                                      
06:16:31-899588 DEBUG    Cache folder: /home/ubuntu/.cache/huggingface/hub                                                                                                                               
06:16:31-900278 DEBUG    Torch overrides: cuda=False rocm=False ipex=False diml=False openvino=False                                                                                                     
06:16:31-901109 DEBUG    Torch allowed: cuda=True rocm=True ipex=True diml=True openvino=True                                                                                                            
06:16:31-902260 INFO     nVidia CUDA toolkit detected: nvidia-smi present                                                                                                                                                                                                                                        
06:16:31-911104 DEBUG    Repository update time: Mon Dec  4 18:31:52 2023                                                                                                                                
06:16:31-911883 INFO     Startup: standard                                         

Relevant log output

No response

Backend

Original

Branch

Master

Model

SD 1.5

Acknowledgements

vladmandic commented 9 months ago

can you run webui --debug and post single log line for both ui and api example (using same parameters)? line in question should look like this:

07:57:24-545026 DEBUG img2img: id_task=task(2u2c7d1g3bty5f3)|mode=2|prompt=|negative_prompt=|prompt_styles=['mine/Girl in Lace']|init_img=None|sketch=None|init_img_with_mask={'image': <PIL.Image.Image image mode=RGBA size=1024x1024 at 0x7F5E28614E50>, 'mask': <PIL.Image.Image image mode=RGB size=1024x1024 at 0x7F5E2873FB10>}|inpaint_color_sketch=None|inpaint_color_sketch_orig=None|init_img_inpaint=None|init_mask_inpaint=None|steps=20|sampler_index=13|latent_index=13|mask_blur=4|mask_alpha=0|inpainting_fill=1|full_quality=True|restore_faces=False|tiling=False|n_iter=1|batch_size=1|cfg_scale=6|image_cfg_scale=1.5|clip_s kip=1|denoising_strength=0.75|seed=-1.0|subseed-1.0|subseed_strength=0|seed_resize_from_h=0|seed_resize_from_w=0|selected_scale_tab=0|height=512|width=512|scale_by=1|resize_mode=1|inpaint_full_res=1|inpaint_full_res_padding=32|inpainting_mask_invert=0|img2img_batch_files=None|img2img_batch_input_dir=|img2img_batch _output_dir=|img2img_batch_inpaint_mask_dir=|override_settings_texts=[]

DevilsCrypto commented 9 months ago

Sure, but the strange thing is, when using the api you don't get the logs representing img2img.

Output with webui:

12:59:52-933694 DEBUG    img2img: id_task=task(oly2yijytw3b9w7)|mode=4|prompt=masterpiece, best quality, beautiful mountains , 8k uhd, dslr, soft lighting, high quality, film grain, Fujifilm XT3       
                         |negative_prompt=(deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime), text, cropped, out of frame, worst quality, low quality,   
                         jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy,   
                         bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, 
                         long neck,                                                                                                                                                                      
                         UnrealisticDream|prompt_styles=[]|init_img=None|sketch=None|init_img_with_mask=None|inpaint_color_sketch=None|inpaint_color_sketch_orig=None|init_img_inpaint=<PIL.Image.Image  
                         image mode=RGB size=1000x667 at 0x7F940ED8CE50>|init_mask_inpaint=<PIL.Image.Image image mode=RGB size=1000x667 at                                                              
                         0x7F940ED8F220>|steps=30|sampler_index=3|latent_index=None|mask_blur=0|mask_alpha=0|inpainting_fill=0|full_quality=True|restore_faces=False|tiling=False|n_iter=1|batch_size=1|c
                         fg_scale=6|image_cfg_scale=1.5|clip_skip=1|denoising_strength=0.75|seed=-1.0|subseed-1.0|subseed_strength=0|seed_resize_from_h=0|seed_resize_from_w=0|selected_scale_tab=0|heigh
                         t=512|width=512|scale_by=1|resize_mode=0|inpaint_full_res=1|inpaint_full_res_padding=32|inpainting_mask_invert=0|img2img_batch_files=None|img2img_batch_input_dir=|img2img_batch
                         _output_dir=|img2img_batch_inpaint_mask_dir=|override_settings_texts=[]  

Output when using the API:

3:04:18-510525 DEBUG    Sampler: sampler="Euler a" config={'scheduler': 'default', 'brownian_noise': False}                                                                                             
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 23/23 [00:01<00:00, 12.96it/s]
13:04:20-610839 DEBUG    Saving: image="outputs/image/00286-realistic_vision_5_1_inpaint-masterpiece best quality RAW photo of beautiful mountain.jpg" type=JPEG size=1000x667                                 
13:04:20-616141 INFO     Processed: images=1 time=2.11 its=14.25 memory={'ram': {'used': 9.61, 'total': 196.57}, 'gpu': {'used': 7.3, 'total': 21.99}, 'retries': 0, 'oom': 0}      
DevilsCrypto commented 9 months ago

Maybe good to notice, on some images it works, but the one of the mountains i am able to let it fail consistently. Could it be somewhere in the pipeline that a certain output causes the pipeline to not use the inpainted result?

vladmandic commented 9 months ago

there is no difference in processing or postprocessing, but there is difference in preprocessing - you can see that even log message is bypassed when used via api.

i've made some changes in dev branch.

i've tried to reproduce and i'm consistently getting high-quality output using your input images - see below.

try to switch to dev branch and reproduce:

if you reproduce a problem, post here and i'll reopen the issue.

example:

cli/simple-img2img.py --init ~/downloads/orig.jpg --prompt "winter wonderland" --mask ~/downloads/mask.jpg
2023-12-11 09:56:54,399 INFO: img2img: Namespace(init='/home/vlado/downloads/orig.jpg', mask='/home/vlado/downloads/mask.jpg', prompt='winter wonderland', negative='', steps=20, seed=-1, sampler='Euler a', model=None)
2023-12-11 09:56:56,218 INFO: received image: size=(1000, 664) file=/tmp/simple-img2img.jpg time=1.81 info="{"prompt": "winter wonderland", "all_prompts": ["winter wonderland"], "negative_prompt": "", "all_negative_prompts": [""], "seed": 1008154751, "all_seeds": [1008154751], "subseed": 3336190964, "all_subseeds": [3336190964], "subseed_strength": 0, "width": 1000, "height": 664, "sampler_name": "Euler a", "cfg_scale": 7.0, "steps": 20, "batch_size": 1, "restore_faces": false, "face_restoration_model": null, "sd_model_hash": "ec6f68ea63", "seed_resize_from_w": -1, "seed_resize_from_h": -1, "denoising_strength": 0.75, "extra_generation_params": {"Sampler Eta": 0}, "index_of_first_image": 0, "infotexts": ["winter wonderland\nSteps: 20, Seed: 1008154751, Sampler: Euler a, CFG scale: 7.0, Size: 1000x664, Parser: Full parser, Model: lyriel-v16, Model hash: ec6f68ea63, Seed resize from: -1x-1, Backend: Original, App: SD.Next, Version: 91a8746, Operations: inpaint, Init image size: 1000x664, Init image hash: ed0a8c4f, Resize scale: 1.0, Mask blur: 4, Denoising strength: 0.75, Resize mode: None, Sampler Eta: 0"], "styles": [], "job_timestamp": "20231211095654", "clip_skip": 1}"

simple-img2img

DevilsCrypto commented 9 months ago

Thanks for the help. I looks like its caused by some API input. When i just only set the things from the cli script it works as expected. For some settings i had to turn them on and of to see which corresponds to certain api values. But in the end its working now. I will do some more testing to see what input variable caused the issue and let you know.