metercai / SimpleSDXL

Enhanced version of Fooocus for SDXL, more suitable for Chinese and Cloud
GNU General Public License v3.0
678 stars 32 forks source link

[Bug]: SimpleSDXL Does Not Produce A1111 Metadata in Comfy Mode #79

Closed DavidDragonsage closed 2 months ago

DavidDragonsage commented 2 months ago

Checklist

What happened?

When attempting to generate an image in Comfy mode with the A1111 Metadata option enabled in Advanced Tools, the image fails to generate. A KeyError: 'vae' occurs just before the completed image is about to be saved.

The A1111 Metadata option works fine in Fooocus mode.

Steps to reproduce the problem

1) Switch to a Comfy preset such as "Flux". 2) Select the A1111 Metadata option from the bottom of the Advanced Tools pane. 3) Enter a prompt 4) Generate the image. 5) The image fails to generate and an error sequence appears in the console.

What should have happened?

SimpleSDXL should have produced an image containing A1111 metadata.

What browsers do you use to access Fooocus?

Mozilla Firefox

Where are you running Fooocus?

Locally

What operating system are you using?

Windows 10

Console logs

[Comfyd] Starting Comfyd server!

[Topbar] Reset_context: preset=default-->Flux, theme=dark, lang=en
Loaded preset: F:\SimpleAI\SimpleSDXL2_win_0916\SimpleSDXL\presets\Flux.json
[Comfyd] Comfyd freeing!
[Topbar] Reset_context: preset=Flux-->FluxDevD, theme=dark, lang=en
Loaded preset: F:\SimpleAI\SimpleSDXL2_win_0916\SimpleSDXL\presets\FluxDevD.json
[Comfyd] Comfyd freeing!
Using xformers attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using xformers attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale'}
left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids'])
Base model loaded: E:\stable-diffusion-webui\models\Stable-diffusion\juggernautXL_juggXIByRundiffusion.safetensors
VAE loaded: None
Request to load LoRAs [('sd_xl_offset_example-lora_1.0.safetensors', 0.1)] for model [E:\stable-diffusion-webui\models\Stable-diffusion\juggernautXL_juggXIByRundiffusion.safetensors].
Loaded LoRA [E:\stable-diffusion-webui\models\Lora\sd_xl_offset_example-lora_1.0.safetensors] for UNet [E:\stable-diffusion-webui\models\Stable-diffusion\juggernautXL_juggXIByRundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cuda:0, use_fp16 = True.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 0.42 seconds
Started worker with PID 6996
App started successful. Use the app with http://192.168.1.69:8186/ or 192.168.1.69:8186
[Fooocus] GPU memory: max_reserved=2.080GB, max_allocated=1.973GB, reserved=2.080GB, free=8.903GB, free_torch=0.107GB, free_total=9.011GB, gpu_total=12.000GB, torch_total=2.080GB
[ToolBox] Reset_params_from_image: -->Flux.1 params from the image with embedded parameters.
reciver prompt:full body long shot: a Kosovar adult woman (strides:1.2) through the autumn with a (benign:1.2) expression, 35mm lens, natural lighting, clearly defined facial features, sharp background, deep depth of field, (rim lighting:1.4)
[Fooocus] GPU memory: max_reserved=2.080GB, max_allocated=1.973GB, reserved=2.080GB, free=8.903GB, free_torch=0.107GB, free_total=9.011GB, gpu_total=12.000GB, torch_total=2.080GB
[TaskEngine] Task_class:Flux, Task_name:Flux, Task_method:flux_base
[TaskEngine] Enable Comfyd backend.
[Comfyd] Comfyd is active!
[Parameters] Adaptive CFG = 7
[Parameters] CLIP Skip = 2
[Parameters] Sharpness = 2
[Parameters] ControlNet Softness = 0.25
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] Seed = 6415921003060860441
[Parameters] CFG = 3.5
[Fooocus] Loading control models ...
[Parameters] Sampler = euler - simple
[Parameters] Steps = 20 - 30
[Fooocus] Initializing ...
[Fooocus] Processing prompts ...
[Wildcards] Copmile text in prompt to arrays: full body long shot: a Kosovar adult woman (strides:1.2) through the autumn with a (benign:1.2) expression, 35mm lens, natural lighting, clearly defined facial features, sharp background, deep depth of field, (rim lighting:1.4) -> arrays:[], mult:0
[Fooocus] Preparing Fooocus text #1 ...
F:\SimpleAI\SimpleSDXL2_win_0916\python_embeded\lib\site-packages\transformers\models\gpt2\modeling_gpt2.py:650: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
[Prompt Expansion] full body long shot: a Kosovar adult woman (strides:1.2) through the autumn with a (benign:1.2) expression, 35mm lens, natural lighting, clearly defined facial features, sharp background, deep depth of field, (rim lighting:1.4), elegant, highly detailed, rich colors, ambient light, dynamic
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (1152, 896)
Preparation time: 2.94 seconds
Using simple scheduler.
[Fooocus] GPU memory: max_reserved=2.105GB, max_allocated=1.990GB, reserved=0.020GB, free=10.933GB, free_torch=0.012GB, free_total=10.944GB, gpu_total=12.000GB, torch_total=0.020GB
[Fooocus] Preparing Flux task 1/1 ...
[ComfyClient] Ready ComfyTask to process: workflow=flux_base_nf4
    prompt = full body long shot: a Kosovar adult woman (strides:1.2) through the autumn with a (benign:1.2) expression, 35mm lens, natural lighting, clearly defined facial features, sharp background, deep depth of field, (rim lighting:1.4)
    negative_prompt = (worst quality, low quality, normal quality, lowres, low details, oversaturated, undersaturated, overexposed, underexposed, grayscale, bw, bad photo, bad photography, bad art), (watermark, signature, text font, username, error, logo, words, letters, digits, autograph, trademark, name), (blur, blurry, grainy), morbid, ugly, asymmetrical, mutated malformed, mutilated, poorly lit, bad shadow, draft, cropped, out of frame, cut off, censored, jpeg artifacts, out of focus, glitch, duplicate, (airbrushed, cartoon, anime, semi-realistic, cgi, render, blender, digital art, manga, amateur), (3D ,3D Game, 3D Game Scene, 3D Character), (bad hands, bad anatomy, bad body, bad face, bad teeth, bad arms, bad legs, deformities)
    width = 896
    height = 1152
    base_model = flux1-dev-bnb-nf4-v2.safetensors
    sampler = euler
    scheduler = simple
    cfg_scale = 3.5
    steps = 20
    denoise = 1.0
    seed = 6415921003060860441
[Comfyd] got prompt
[ComfyClient] Request and get ComfyTask_id:8370fb46-2c3f-44e1-afde-4e2ca3be9a88
[Comfyd] GPU memory: max_reserved=0.000GB, max_allocated=0.000GB, reserved=0.000GB, free=10.983GB, free_torch=0.000GB, free_total=10.983GB, gpu_total=12.000GB, torch_total=0.000GB
[Comfyd] WARNING: SaveImageWebsocket.IS_CHANGED() missing 1 required positional argument: 's'
[Comfyd] model weight dtype torch.bfloat16, manual cast: None
[Comfyd] model_type FLUX
[Comfyd] Requested to load FluxClipModel_
[Comfyd] Loading 1 new model
[Comfyd] loaded completely 0.0 4777.53759765625 True
F:\SimpleAI\SimpleSDXL2_win_0916\SimpleSDXL\comfy\comfy\ldm\modules\attention.py:408: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
  out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False)
[Comfyd] Requested to load Flux
[Comfyd] Loading 1 new model
[Comfyd] loaded completely 0.0 6388.649485588074 True
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [01:07<00:00,  3.38s/it]
[Comfyd] Requested to load AutoencodingEngine
[Comfyd] Loading 1 new model
[Comfyd] loaded completely 0.0 159.87335777282715 True
[Comfyd] GPU memory: max_reserved=6.969GB, max_allocated=6.840GB, reserved=0.031GB, free=10.899GB, free_torch=0.023GB, free_total=10.923GB, gpu_total=12.000GB, torch_total=0.031GB
[Comfyd] Prompt executed in 203.81 seconds
[ComfyClient] The ComfyTask:8370fb46-2c3f-44e1-afde-4e2ca3be9a88 has finished: 1
[Fooocus] Saving image 1/1 to system ...
Image generated with private log at: E:\stable-diffusion-webui\outputs\2024-09-18\log.html
Generating and saving time: 205.01 seconds
[Fooocus] GPU memory: max_reserved=0.020GB, max_allocated=0.008GB, reserved=0.020GB, free=10.933GB, free_torch=0.012GB, free_total=10.944GB, gpu_total=12.000GB, torch_total=0.020GB
[Enhance] Skipping, preconditions aren't met
Processing time (total): 205.89 seconds
[Comfyd] Task finished !
Total time: 209.03 seconds
[Gallery] Refresh_output_catalog: loaded 420 images_catalogs.
[Gallery] Parse_html_log: loaded 1 image_infos of 24-09-18.
[Gallery] Refresh_images_catalog: loaded 1 image_items of 24-09-18.
[Gallery] Parse_html_log: loaded 1 image_infos of 24-09-18.
reciver prompt:full body long shot: a Kosovar adult woman (strides:1.2) through the autumn with a (benign:1.2) expression, 35mm lens, natural lighting, clearly defined facial features, sharp background, deep depth of field, (rim lighting:1.4)
[Fooocus] GPU memory: max_reserved=0.020GB, max_allocated=0.008GB, reserved=0.020GB, free=10.933GB, free_torch=0.012GB, free_total=10.944GB, gpu_total=12.000GB, torch_total=0.020GB
[TaskEngine] Task_class:Flux, Task_name:Flux, Task_method:flux_base
[TaskEngine] Enable Comfyd backend.
[Comfyd] Comfyd is active!
[Parameters] Adaptive CFG = 7
[Parameters] CLIP Skip = 2
[Parameters] Sharpness = 2
[Parameters] ControlNet Softness = 0.25
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] Seed = 5441180189181032878
[Parameters] CFG = 3.5
[Fooocus] Loading control models ...
[Parameters] Sampler = euler - simple
[Parameters] Steps = 20 - 30
[Fooocus] Initializing ...
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cuda:0, use_fp16 = True.
[Fooocus] Processing prompts ...
[Wildcards] Copmile text in prompt to arrays: full body long shot: a Kosovar adult woman (strides:1.2) through the autumn with a (benign:1.2) expression, 35mm lens, natural lighting, clearly defined facial features, sharp background, deep depth of field, (rim lighting:1.4) -> arrays:[], mult:0
[Fooocus] Preparing Fooocus text #1 ...
Fooocus Expansion loaded by itself.
Requested to load GPT2LMHeadModel
Loading 1 new model
[Prompt Expansion] full body long shot: a Kosovar adult woman (strides:1.2) through the autumn with a (benign:1.2) expression, 35mm lens, natural lighting, clearly defined facial features, sharp background, deep depth of field, (rim lighting:1.4), elegant, highly detailed, rich colors, romantic, epic, stunning
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (1152, 896)
Preparation time: 2.07 seconds
Using simple scheduler.
[Fooocus] GPU memory: max_reserved=0.287GB, max_allocated=0.268GB, reserved=0.020GB, free=10.933GB, free_torch=0.012GB, free_total=10.944GB, gpu_total=12.000GB, torch_total=0.020GB
[Fooocus] Preparing Flux task 1/1 ...
[ComfyClient] Ready ComfyTask to process: workflow=flux_base_nf4
    prompt = full body long shot: a Kosovar adult woman (strides:1.2) through the autumn with a (benign:1.2) expression, 35mm lens, natural lighting, clearly defined facial features, sharp background, deep depth of field, (rim lighting:1.4)
    negative_prompt = (worst quality, low quality, normal quality, lowres, low details, oversaturated, undersaturated, overexposed, underexposed, grayscale, bw, bad photo, bad photography, bad art), (watermark, signature, text font, username, error, logo, words, letters, digits, autograph, trademark, name), (blur, blurry, grainy), morbid, ugly, asymmetrical, mutated malformed, mutilated, poorly lit, bad shadow, draft, cropped, out of frame, cut off, censored, jpeg artifacts, out of focus, glitch, duplicate, (airbrushed, cartoon, anime, semi-realistic, cgi, render, blender, digital art, manga, amateur), (3D ,3D Game, 3D Game Scene, 3D Character), (bad hands, bad anatomy, bad body, bad face, bad teeth, bad arms, bad legs, deformities)
    width = 896
    height = 1152
    base_model = flux1-dev-bnb-nf4-v2.safetensors
    sampler = euler
    scheduler = simple
    cfg_scale = 3.5
    steps = 20
    denoise = 1.0
    seed = 5441180189181032878
[Comfyd] got prompt
[ComfyClient] Request and get ComfyTask_id:189f93d4-fa5a-4b0d-b0cf-842dd99fdd46
[Comfyd] GPU memory: max_reserved=0.031GB, max_allocated=0.008GB, reserved=0.031GB, free=10.899GB, free_torch=0.023GB, free_total=10.923GB, gpu_total=12.000GB, torch_total=0.031GB
[Comfyd] WARNING: SaveImageWebsocket.IS_CHANGED() missing 1 required positional argument: 's'
[Comfyd] Requested to load Flux
[Comfyd] Loading 1 new model
[Comfyd] loaded completely 0.0 6388.649485588074 True
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [01:10<00:00,  3.55s/it]
[Comfyd] Requested to load AutoencodingEngine
[Comfyd] Loading 1 new model
[Comfyd] loaded completely 0.0 159.87335777282715 True
[Comfyd] GPU memory: max_reserved=6.969GB, max_allocated=6.840GB, reserved=0.031GB, free=10.899GB, free_torch=0.023GB, free_total=10.923GB, gpu_total=12.000GB, torch_total=0.031GB
[Comfyd] Prompt executed in 77.57 seconds
[ComfyClient] The ComfyTask:189f93d4-fa5a-4b0d-b0cf-842dd99fdd46 has finished: 1
[Fooocus] Saving image 1/1 to system ...
Traceback (most recent call last):
  File "F:\SimpleAI\SimpleSDXL2_win_0916\SimpleSDXL\modules\async_worker.py", line 1641, in worker
    handler(task)
  File "F:\SimpleAI\SimpleSDXL2_win_0916\python_embeded\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "F:\SimpleAI\SimpleSDXL2_win_0916\python_embeded\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "F:\SimpleAI\SimpleSDXL2_win_0916\SimpleSDXL\modules\async_worker.py", line 1447, in handler
    imgs, img_paths, current_progress = process_task(all_steps, async_task, callback_function, controlnet_canny_path,
  File "F:\SimpleAI\SimpleSDXL2_win_0916\SimpleSDXL\modules\async_worker.py", line 412, in process_task
    img_paths = save_and_log(async_task, height, imgs, task, use_expansion, width, loras, persist_image)
  File "F:\SimpleAI\SimpleSDXL2_win_0916\SimpleSDXL\modules\async_worker.py", line 488, in save_and_log
    img_paths.append(log(x, d, metadata_parser, async_task.output_format, task, persist_image))
  File "F:\SimpleAI\SimpleSDXL2_win_0916\SimpleSDXL\modules\private_logger.py", line 33, in log
    parsed_parameters = metadata_parser.to_string(metadata.copy()) if metadata_parser is not None else ''
  File "F:\SimpleAI\SimpleSDXL2_win_0916\SimpleSDXL\modules\meta_parser.py", line 607, in to_string
    self.fooocus_to_a1111['vae']: Path(data['vae']).stem,
KeyError: 'vae'
Total time: 80.15 seconds
[Gallery] Refresh_output_catalog: loaded 420 images_catalogs.
[Gallery] Parse_html_log: loaded 1 image_infos of 24-09-18.
[Gallery] Refresh_images_catalog: loaded 1 image_items of 24-09-18.
[Gallery] Parse_html_log: loaded 1 image_infos of 24-09-18.

Additional information

No response

metercai commented 2 months ago

Only simple metadata has been revised and is now available

DavidDragonsage commented 2 months ago

Understood. I know some people count on the A1111 option for posting images to Civitai, with a particular interest in posting Flux images. I will let the folks know this is a known bug.

metercai commented 2 months ago

Like #81 , bug in schema. The fixed code has been released. Restart to upgrade, and verify it

DavidDragonsage commented 2 months ago

I can confirm that this bug is now fixed!

I successfully posted both Comfy and Fooocus images containing A1111 metadata to Civitai with no problems.