metercai / SimpleSDXL

Enhanced version of Fooocus for SDXL, more suitable for Chinese and Cloud
GNU General Public License v3.0
678 stars 32 forks source link

[Bug]: Restore Detail to Flux1 Dev Image Generations #77

Closed DavidDragonsage closed 2 months ago

DavidDragonsage commented 2 months ago

Checklist

What happened?

In win_0916, images generated in Flux1 Dev has less detail than those generated using win_820. For any given image the logs show that all listed parameters are identical between the old and new releases of SimpleSDXL - yet the results are different. Please see the example images.

win2_0820 2024-09-11_19-38-39_6828

win_0916 2024-09-16_23-48-17_1914

Steps to reproduce the problem

1) Start SimpleSDXL win_960 2) Find an Flux1 Dev image made using win_820 3) Load the image into the Metadata window and apply 4) Generate 5) Compare the images. If you check the image logs the parameters will be identical. The images should also be identical, but they are not.

What should have happened?

If you load the image metadata from an image made with win_820 and regenerate it in win_960 the results should be identical. If it is a Flux1 Dev image they are not. Sometimes the difference will be minor and sometimes the differences will be very obvious. Typically the image made with win_820 will contain more detail and follow the prompt more closely although this is not always the case. Sometimes the lack of detail will make the win_960 image appear to be less focused, especially if the detail is lacking around the eyes.

Another way to express this is to say that the images from win_960 tend to be more simple.

What browsers do you use to access Fooocus?

Mozilla Firefox

Where are you running Fooocus?

Locally

What operating system are you using?

Windows 10

Console logs

[Comfyd] Starting Comfyd server!

[Topbar] Reset_context: preset=default-->Flux, theme=dark, lang=en
Loaded preset: F:\SimpleAI\SimpleSDXL2_win_0916\SimpleSDXL\presets\Flux.json
[Comfyd] Comfyd freeing!
[Topbar] Reset_context: preset=Flux-->FluxDevD, theme=dark, lang=en
Loaded preset: F:\SimpleAI\SimpleSDXL2_win_0916\SimpleSDXL\presets\FluxDevD.json
[Comfyd] Comfyd freeing!
Using xformers attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using xformers attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale'}
left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids'])
Base model loaded: E:\stable-diffusion-webui\models\Stable-diffusion\juggernautXL_juggXIByRundiffusion.safetensors
VAE loaded: None
Request to load LoRAs [('sd_xl_offset_example-lora_1.0.safetensors', 0.1)] for model [E:\stable-diffusion-webui\models\Stable-diffusion\juggernautXL_juggXIByRundiffusion.safetensors].
Loaded LoRA [E:\stable-diffusion-webui\models\Lora\sd_xl_offset_example-lora_1.0.safetensors] for UNet [E:\stable-diffusion-webui\models\Stable-diffusion\juggernautXL_juggXIByRundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cuda:0, use_fp16 = True.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 0.42 seconds
Started worker with PID 6996
App started successful. Use the app with http://192.168.1.69:8186/ or 192.168.1.69:8186
[Fooocus] GPU memory: max_reserved=2.080GB, max_allocated=1.973GB, reserved=2.080GB, free=8.903GB, free_torch=0.107GB, free_total=9.011GB, gpu_total=12.000GB, torch_total=2.080GB
[ToolBox] Reset_params_from_image: -->Flux.1 params from the image with embedded parameters.
reciver prompt:full body long shot: a Kosovar adult woman (strides:1.2) through the autumn with a (benign:1.2) expression, 35mm lens, natural lighting, clearly defined facial features, sharp background, deep depth of field, (rim lighting:1.4)
[Fooocus] GPU memory: max_reserved=2.080GB, max_allocated=1.973GB, reserved=2.080GB, free=8.903GB, free_torch=0.107GB, free_total=9.011GB, gpu_total=12.000GB, torch_total=2.080GB
[TaskEngine] Task_class:Flux, Task_name:Flux, Task_method:flux_base
[TaskEngine] Enable Comfyd backend.
[Comfyd] Comfyd is active!
[Parameters] Adaptive CFG = 7
[Parameters] CLIP Skip = 2
[Parameters] Sharpness = 2
[Parameters] ControlNet Softness = 0.25
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] Seed = 6415921003060860441
[Parameters] CFG = 3.5
[Fooocus] Loading control models ...
[Parameters] Sampler = euler - simple
[Parameters] Steps = 20 - 30
[Fooocus] Initializing ...
[Fooocus] Processing prompts ...
[Wildcards] Copmile text in prompt to arrays: full body long shot: a Kosovar adult woman (strides:1.2) through the autumn with a (benign:1.2) expression, 35mm lens, natural lighting, clearly defined facial features, sharp background, deep depth of field, (rim lighting:1.4) -> arrays:[], mult:0
[Fooocus] Preparing Fooocus text #1 ...
F:\SimpleAI\SimpleSDXL2_win_0916\python_embeded\lib\site-packages\transformers\models\gpt2\modeling_gpt2.py:650: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
[Prompt Expansion] full body long shot: a Kosovar adult woman (strides:1.2) through the autumn with a (benign:1.2) expression, 35mm lens, natural lighting, clearly defined facial features, sharp background, deep depth of field, (rim lighting:1.4), elegant, highly detailed, rich colors, ambient light, dynamic
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (1152, 896)
Preparation time: 2.94 seconds
Using simple scheduler.
[Fooocus] GPU memory: max_reserved=2.105GB, max_allocated=1.990GB, reserved=0.020GB, free=10.933GB, free_torch=0.012GB, free_total=10.944GB, gpu_total=12.000GB, torch_total=0.020GB
[Fooocus] Preparing Flux task 1/1 ...
[ComfyClient] Ready ComfyTask to process: workflow=flux_base_nf4
    prompt = full body long shot: a Kosovar adult woman (strides:1.2) through the autumn with a (benign:1.2) expression, 35mm lens, natural lighting, clearly defined facial features, sharp background, deep depth of field, (rim lighting:1.4)
    negative_prompt = (worst quality, low quality, normal quality, lowres, low details, oversaturated, undersaturated, overexposed, underexposed, grayscale, bw, bad photo, bad photography, bad art), (watermark, signature, text font, username, error, logo, words, letters, digits, autograph, trademark, name), (blur, blurry, grainy), morbid, ugly, asymmetrical, mutated malformed, mutilated, poorly lit, bad shadow, draft, cropped, out of frame, cut off, censored, jpeg artifacts, out of focus, glitch, duplicate, (airbrushed, cartoon, anime, semi-realistic, cgi, render, blender, digital art, manga, amateur), (3D ,3D Game, 3D Game Scene, 3D Character), (bad hands, bad anatomy, bad body, bad face, bad teeth, bad arms, bad legs, deformities)
    width = 896
    height = 1152
    base_model = flux1-dev-bnb-nf4-v2.safetensors
    sampler = euler
    scheduler = simple
    cfg_scale = 3.5
    steps = 20
    denoise = 1.0
    seed = 6415921003060860441
[Comfyd] got prompt
[ComfyClient] Request and get ComfyTask_id:8370fb46-2c3f-44e1-afde-4e2ca3be9a88
[Comfyd] GPU memory: max_reserved=0.000GB, max_allocated=0.000GB, reserved=0.000GB, free=10.983GB, free_torch=0.000GB, free_total=10.983GB, gpu_total=12.000GB, torch_total=0.000GB
[Comfyd] WARNING: SaveImageWebsocket.IS_CHANGED() missing 1 required positional argument: 's'
[Comfyd] model weight dtype torch.bfloat16, manual cast: None
[Comfyd] model_type FLUX
[Comfyd] Requested to load FluxClipModel_
[Comfyd] Loading 1 new model
[Comfyd] loaded completely 0.0 4777.53759765625 True
F:\SimpleAI\SimpleSDXL2_win_0916\SimpleSDXL\comfy\comfy\ldm\modules\attention.py:408: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
  out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False)
[Comfyd] Requested to load Flux
[Comfyd] Loading 1 new model
[Comfyd] loaded completely 0.0 6388.649485588074 True
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [01:07<00:00,  3.38s/it]
[Comfyd] Requested to load AutoencodingEngine
[Comfyd] Loading 1 new model
[Comfyd] loaded completely 0.0 159.87335777282715 True
[Comfyd] GPU memory: max_reserved=6.969GB, max_allocated=6.840GB, reserved=0.031GB, free=10.899GB, free_torch=0.023GB, free_total=10.923GB, gpu_total=12.000GB, torch_total=0.031GB
[Comfyd] Prompt executed in 203.81 seconds
[ComfyClient] The ComfyTask:8370fb46-2c3f-44e1-afde-4e2ca3be9a88 has finished: 1
[Fooocus] Saving image 1/1 to system ...
Image generated with private log at: E:\stable-diffusion-webui\outputs\2024-09-18\log.html
Generating and saving time: 205.01 seconds
[Fooocus] GPU memory: max_reserved=0.020GB, max_allocated=0.008GB, reserved=0.020GB, free=10.933GB, free_torch=0.012GB, free_total=10.944GB, gpu_total=12.000GB, torch_total=0.020GB
[Enhance] Skipping, preconditions aren't met
Processing time (total): 205.89 seconds
[Comfyd] Task finished !
Total time: 209.03 seconds
[Gallery] Refresh_output_catalog: loaded 420 images_catalogs.
[Gallery] Parse_html_log: loaded 1 image_infos of 24-09-18.
[Gallery] Refresh_images_catalog: loaded 1 image_items of 24-09-18.
[Gallery] Parse_html_log: loaded 1 image_infos of 24-09-18.

Additional information

The attached console log was for an image that I regenerated just now, chosen completely at random from those I made with win_820. The first attached image is from win_820 and the second image is from win_960.

The differences between these two are subtle. But notice that in the win_820 image the subject is wearing gloves, which is appropriate to the prompt and to the heavy clothing she is wearing: "full body long shot: a Kosovar adult woman (strides:1.2) through the autumn with a (benign:1.2) expression, 35mm lens, natural lighting, clearly defined facial features, sharp background, deep depth of field, (rim lighting:1.4)"

In the win_960 image she has lost her gloves. Also, the person who was following her and causing a blurred reflection is gone.

For more information and image logs, see also "Apparent Loss of Detail in Flux1 Dev from win_820 to win_0916 #76" in the Discussion section.

2024-09-12_21-43-34_7990

2024-09-18_09-59-51_1304

metercai commented 2 months ago

This is due to the difference caused by Comfyd upgrade, not a bug

DavidDragonsage commented 2 months ago

I was suspecting that might be the case. This is an impairment in quality, although the degree of that impairment is quite variable.

Is is practical to roll back the Comfy upgrade or are there other issues that make the Comfy upgrade necessary?