metercai / SimpleSDXL

Enhanced version of Fooocus for SDXL, more suitable for Chinese and Cloud
GNU General Public License v3.0
558 stars 27 forks source link

[Bug]: Fooocus Enhance Style with RealvisXLv40_v40Bakedvae.safetensors Creates Junk Images #81

Open DavidDragonsage opened 1 day ago

DavidDragonsage commented 1 day ago

Checklist

What happened?

When generating an image using the RealvisXLv40_v40Bakedvae.safetensors base model and using the Fooocus Enhance style "win2_0916" creates a highly distorted image. The same image is produced correctly using "win2_0820".

This distortion occurs with all images I have attempted to create in win2_960 using the RealvisXLv40_v40Bakedvae base model and the Fooocus Enhance style. Image generation will be normal if the Fooocus Semi Realistic style is substituted for the Fooocus Enhance style.

I did not test this failure with various other XL base models, but RealvisXLv40_v40Bakedvae has never created these sort of problems before, and it is one of my favourite models.

Steps to reproduce the problem

1) Start SimpleSDXL win2_960 2) Find an image made with win2_820, created using the RealvisXLv40_v40Bakedvae base model and the Fooocus Enhance style 3) Load the image into the Metadata window and apply 4) Generate 5) Compare the images. If you check the image logs the parameters will be identical. However, the image created with win2_960 will be a junk image instead of a good image.

What should have happened?

The images created using both win2_960 and win2_820 should be identical. The first attached image was created in win2_820 and the second attachment was created in win2_960.

What browsers do you use to access Fooocus?

Mozilla Firefox

Where are you running Fooocus?

Locally

What operating system are you using?

Windows 10

Console logs

[Fooocus Model Management] Moving model(s) has taken 0.57 seconds
reciver prompt:full body long shot: a Norwegian mature woman (ambles:1.2) through the autumn with a (ambiguous:1.2) expression, 35mm lens, natural lighting, clearly defined facial features, sharp background, deep depth of field, (rim lighting:1.4)
[Fooocus] GPU memory: max_reserved=5.473GB, max_allocated=5.132GB, reserved=2.104GB, free=8.829GB, free_torch=0.121GB, free_total=8.950GB, gpu_total=12.000GB, torch_total=2.104GB
[TaskEngine] Task_class:Fooocus, Task_name:default, Task_method:text2image
[TaskEngine] Enable Fooocus backend.
[Comfyd] Comfyd freeing!
[Parameters] Adaptive CFG = 7
[Parameters] CLIP Skip = 2
[Parameters] Sharpness = 2
[Parameters] ControlNet Softness = 0.25
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] Seed = 7492231340810660130
[Parameters] CFG = 7
[Fooocus] Loading control models ...
[Parameters] Sampler = dpmpp_2m - karras
[Parameters] Steps = 30 - 18
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Wildcards] Copmile text in prompt to arrays: full body long shot: a Norwegian mature woman (ambles:1.2) through the autumn with a (ambiguous:1.2) expression, 35mm lens, natural lighting, clearly defined facial features, sharp background, deep depth of field, (rim lighting:1.4) -> arrays:[], mult:0
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] full body long shot: a Norwegian mature woman (ambles:1.2) through the autumn with a (ambiguous:1.2) expression, 35mm lens, natural lighting, clearly defined facial features, sharp background, deep depth of field, (rim lighting:1.4), highly detailed, elegant, delicate, intricate, epic composition, cinematic light
[Fooocus] Encoding positive #1 ...
[Fooocus Model Management] Moving model(s) has taken 0.13 seconds
[Fooocus] Encoding negative #1 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (1152, 896)
Preparation time: 1.10 seconds
Using karras scheduler.
[Fooocus] GPU memory: max_reserved=2.105GB, max_allocated=1.990GB, reserved=1.975GB, free=8.958GB, free_torch=0.241GB, free_total=9.199GB, gpu_total=12.000GB, torch_total=1.975GB
[Fooocus] Preparing Fooocus task 1/1 ...
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 2.10 seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:23<00:00,  1.29it/s]
Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 0.17 seconds
[Fooocus] Saving image 1/1 to system ...
Image generated with private log at: E:\stable-diffusion-webui\outputs\2024-09-19\log.html
Generating and saving time: 26.65 seconds
[Fooocus] GPU memory: max_reserved=5.471GB, max_allocated=5.131GB, reserved=3.344GB, free=7.589GB, free_torch=3.175GB, free_total=10.763GB, gpu_total=12.000GB, torch_total=3.344GB
[Enhance] Skipping, preconditions aren't met
Processing time (total): 26.65 seconds
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
Total time: 27.79 seconds
[Gallery] Refresh_output_catalog: loaded 433 images_catalogs.
[Gallery] Parse_html_log: loaded 29 image_infos of 24-09-19.
[Gallery] Refresh_images_catalog: loaded 29 image_items of 24-09-19.
[Gallery] Parse_html_log: loaded 29 image_infos of 24-09-19.
[Fooocus Model Management] Moving model(s) has taken 0.55 seconds

Additional information

It occurs to me that this anomaly may be related to the "[Bug]: Restore Detail to Flux1 Dev Image Generations #77" issue, in that it appears that the image generation algorithms have changed in some way between "win2_0820" and "win2_0916".

This is the image log. The logs are identical except for the version description: { "prompt": "full body long shot: a Norwegian mature woman (ambles:1.2) through the autumn with a (ambiguous:1.2) expression, 35mm lens, natural lighting, clearly defined facial features, sharp background, deep depth of field, (rim lighting:1.4)", "negative_prompt": "blurry background, bokeh, (NSFW, naked, nude, nudity:1.8)", "prompt_expansion": "full body long shot: a Norwegian mature woman (ambles:1.2) through the autumn with a (ambiguous:1.2) expression, 35mm lens, natural lighting, clearly defined facial features, sharp background, deep depth of field, (rim lighting:1.4), highly detailed, elegant, delicate, intricate, epic composition, cinematic light", "styles": "['Fooocus V2', 'Fooocus Enhance']", "performance": "Speed", "steps": 30, "resolution": "(896, 1152)", "guidance_scale": 7, "sharpness": 2, "adm_guidance": "(1.5, 0.8, 0.3)", "base_model": "RealvisXLv40_v40Bakedvae.safetensors", "refiner_model": "None", "refiner_switch": 0.6, "clip_skip": 2, "sampler": "dpmpp_2m", "scheduler": "karras", "vae": "Default (model)", "seed": "7492231340810660130", "lora_combined_1": "sd_xl_offset_example-lora_1.0.safetensors : 0.5", "backend_engine": "SDXL-Fooocus", "metadata_scheme": "fooocus", "version": "Fooocus v2.5.5 SimpleSDXL_v20240731.baa9f01" }

2024-09-19_17-36-54_1218

2024-09-19_17-53-40_1218

metercai commented 23 hours ago

"metadata_scheme": "fooocus", The scheme of foocus does not support new models such as Flux, Kolors, and SD3m. So it should be 'simple' instead of 'fooocus' in the metadata scheme.

For better compatibility, the 'fooocus' option will be removed and the 'simple' and 'a1111' options will be retained in the future. 'simple' is compatible with 'fooocus' in the SDXL model

metercai commented 19 hours ago

It has been resolved in the dev version and will be pushed to the official version soon. This was generated using the dev version: 2024-09-21_11-48-06_1772

DavidDragonsage commented 18 hours ago

Great news - thank you! 🙂