vladmandic / automatic

SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models
https://github.com/vladmandic/automatic
GNU Affero General Public License v3.0
5.58k stars 408 forks source link

[Issue]: img2img Flux nf4 not working #3453

Closed SAC020 closed 2 weeks ago

SAC020 commented 2 weeks ago

Issue Description

Basic setup:

Precision type is BF16, but I get the same error using FP16

image

Version Platform Description

PS C:\ai\automatic> .\webui.bat --debug Using VENV: C:\ai\automatic\venv 17:06:05-115166 INFO Starting SD.Next 17:06:05-118143 INFO Logger: file="C:\ai\automatic\sdnext.log" level=DEBUG size=65 mode=create 17:06:05-119631 INFO Python: version=3.11.9 platform=Windows bin="C:\ai\automatic\venv\Scripts\python.exe" venv="C:\ai\automatic\venv" 17:06:05-269889 INFO Version: app=sd.next updated=2024-09-24 hash=c00bcde3 branch=dev url=https://github.com/vladmandic/automatic/tree/dev ui=dev 17:06:05-928611 INFO Repository latest available e7ec07f9783701629ca1411ad82aec87232501b9 2024-09-13T16:51:56Z 17:06:05-937044 INFO Platform: arch=AMD64 cpu=Intel64 Family 6 Model 165 Stepping 5, GenuineIntel system=Windows release=Windows-10-10.0.22631-SP0 python=3.11.9

Relevant log output

PS C:\ai\automatic> .\webui.bat --debug
Using VENV: C:\ai\automatic\venv
17:06:05-115166 INFO     Starting SD.Next
17:06:05-118143 INFO     Logger: file="C:\ai\automatic\sdnext.log" level=DEBUG size=65 mode=create
17:06:05-119631 INFO     Python: version=3.11.9 platform=Windows bin="C:\ai\automatic\venv\Scripts\python.exe"
                         venv="C:\ai\automatic\venv"
17:06:05-269889 INFO     Version: app=sd.next updated=2024-09-24 hash=c00bcde3 branch=dev
                         url=https://github.com/vladmandic/automatic/tree/dev ui=dev
17:06:05-928611 INFO     Repository latest available e7ec07f9783701629ca1411ad82aec87232501b9 2024-09-13T16:51:56Z
17:06:05-937044 INFO     Platform: arch=AMD64 cpu=Intel64 Family 6 Model 165 Stepping 5, GenuineIntel system=Windows
                         release=Windows-10-10.0.22631-SP0 python=3.11.9
17:06:05-938532 DEBUG    Setting environment tuning
17:06:05-939524 DEBUG    Torch allocator: "garbage_collection_threshold:0.80,max_split_size_mb:512"
17:06:05-948947 DEBUG    Torch overrides: cuda=False rocm=False ipex=False diml=False openvino=False
17:06:05-949940 DEBUG    Torch allowed: cuda=True rocm=True ipex=True diml=True openvino=True
17:06:05-958371 INFO     CUDA: nVidia toolkit detected
17:06:06-035556 WARNING  Modified files: ['models/Reference/playgroundai--playground-v2-1024px-aesthetic.jpg']
17:06:06-115411 INFO     Verifying requirements
17:06:06-119191 INFO     Verifying packages
17:06:06-148757 DEBUG    Timestamp repository update time: Wed Sep 25 03:36:27 2024
17:06:06-150246 INFO     Startup: standard
17:06:06-150742 INFO     Verifying submodules
17:06:08-886813 DEBUG    Git detached head detected: folder="extensions-builtin/sd-extension-chainner" reattach=main
17:06:08-888300 DEBUG    Git submodule: extensions-builtin/sd-extension-chainner / main
17:06:08-977651 DEBUG    Git detached head detected: folder="extensions-builtin/sd-extension-system-info" reattach=main
17:06:08-978643 DEBUG    Git submodule: extensions-builtin/sd-extension-system-info / main
17:06:09-073379 DEBUG    Git detached head detected: folder="extensions-builtin/sd-webui-agent-scheduler" reattach=main
17:06:09-074371 DEBUG    Git submodule: extensions-builtin/sd-webui-agent-scheduler / main
17:06:09-217219 DEBUG    Git detached head detected: folder="extensions-builtin/sdnext-modernui" reattach=dev
17:06:09-218211 DEBUG    Git submodule: extensions-builtin/sdnext-modernui / dev
17:06:09-340231 DEBUG    Git detached head detected: folder="extensions-builtin/stable-diffusion-webui-rembg"
                         reattach=master
17:06:09-341224 DEBUG    Git submodule: extensions-builtin/stable-diffusion-webui-rembg / master
17:06:09-436952 DEBUG    Git detached head detected: folder="modules/k-diffusion" reattach=master
17:06:09-437943 DEBUG    Git submodule: modules/k-diffusion / master
17:06:09-527719 DEBUG    Git detached head detected: folder="wiki" reattach=master
17:06:09-528711 DEBUG    Git submodule: wiki / master
17:06:09-591738 DEBUG    Register paths
17:06:09-671596 DEBUG    Installed packages: 217
17:06:09-673596 DEBUG    Extensions all: ['Lora', 'sd-extension-chainner', 'sd-extension-system-info',
                         'sd-webui-agent-scheduler', 'sd-webui-controlnet', 'sdnext-modernui',
                         'stable-diffusion-webui-images-browser', 'stable-diffusion-webui-rembg']
17:06:09-830314 DEBUG    Extension installer: C:\ai\automatic\extensions-builtin\sd-extension-system-info\install.py
17:06:09-983579 DEBUG    Extension installer: C:\ai\automatic\extensions-builtin\sd-webui-agent-scheduler\install.py
17:06:12-355931 DEBUG    Extension installer: C:\ai\automatic\extensions-builtin\sd-webui-controlnet\install.py
17:06:19-700677 DEBUG    Extension installer:
                         C:\ai\automatic\extensions-builtin\stable-diffusion-webui-images-browser\install.py
17:06:22-026754 DEBUG    Extension installer: C:\ai\automatic\extensions-builtin\stable-diffusion-webui-rembg\install.py
17:06:29-882814 DEBUG    Extensions all: []
17:06:29-883808 INFO     Extensions enabled: ['Lora', 'sd-extension-chainner', 'sd-extension-system-info',
                         'sd-webui-agent-scheduler', 'sd-webui-controlnet', 'sdnext-modernui',
                         'stable-diffusion-webui-images-browser', 'stable-diffusion-webui-rembg']
17:06:29-885295 INFO     Verifying requirements
17:06:29-886287 DEBUG    Setup complete without errors: 1727273190
17:06:29-892400 DEBUG    Extension preload: {'extensions-builtin': 0.0, 'extensions': 0.0}
17:06:29-893860 DEBUG    Starting module: <module 'webui' from 'C:\\ai\\automatic\\webui.py'>
17:06:29-894852 INFO     Command line args: ['--debug'] debug=True
17:06:29-895844 DEBUG    Env flags: []
17:06:35-448268 INFO     Load packages: {'torch': '2.4.0+cu124', 'diffusers': '0.31.0.dev0', 'gradio': '3.43.2',
                         'transformers': '4.44.2', 'accelerate': '0.34.2'}
17:06:35-851142 DEBUG    Huggingface cache: folder="C:\Users\sebas\.cache\huggingface\hub"
17:06:35-953318 INFO     GPU detect: memory=23.99 optimization=none
17:06:35-955593 DEBUG    Read: file="config.json" json=38 bytes=1590 time=0.000
17:06:35-956585 DEBUG    Unknown settings: ['cross_attention_options']
17:06:35-958074 INFO     Engine: backend=Backend.DIFFUSERS compute=cuda device=cuda attention="Scaled-Dot-Product"
                         mode=no_grad
17:06:36-013024 INFO     Device: device=NVIDIA GeForce RTX 4090 n=1 arch=sm_90 capability=(8, 9) cuda=12.4 cudnn=90100
                         driver=561.09
17:06:36-014512 DEBUG    Read: file="html\reference.json" json=52 bytes=29118 time=0.000
17:06:36-409126 DEBUG    ONNX: version=1.19.2 provider=CUDAExecutionProvider, available=['AzureExecutionProvider',
                         'CPUExecutionProvider']
17:06:36-589172 DEBUG    Importing LDM
17:06:36-616453 DEBUG    Entering start sequence
17:06:36-619428 DEBUG    Initializing
17:06:36-648197 INFO     Available VAEs: path="models\VAE" items=0
17:06:36-650182 INFO     Available UNets: path="models\UNET" items=0
17:06:36-651172 INFO     Available TEs: path="models\Text-encoder" items=0
17:06:36-652660 INFO     Disabled extensions: ['sd-webui-controlnet', 'sdnext-modernui']
17:06:36-654644 DEBUG    Read: file="cache.json" json=2 bytes=10952 time=0.000
17:06:36-663572 DEBUG    Read: file="metadata.json" json=677 bytes=2414766 time=0.007
17:06:36-668532 DEBUG    Scanning diffusers cache: folder="models\Diffusers" items=3 time=0.00
17:06:36-669524 INFO     Available Models: path="models\Stable-diffusion" items=15 time=0.02
17:06:36-875365 DEBUG    Load extensions
17:06:36-939843 INFO     Extension: script='extensions-builtin\Lora\scripts\lora_script.py'
                         [2;36m17:06:36-936371[0m[2;36m [0m[34mINFO    [0m Available LoRAs: [33mitems[0m=[1;36m125[0m
                         [33mfolders[0m=[1;36m3[0m
17:06:37-320160 INFO     Extension: script='extensions-builtin\sd-webui-agent-scheduler\scripts\task_scheduler.py' Using
                         sqlite file: extensions-builtin\sd-webui-agent-scheduler\task_scheduler.sqlite3
17:06:37-509491 DEBUG    Extensions init time: 0.63 sd-webui-agent-scheduler=0.34
                         stable-diffusion-webui-images-browser=0.18
17:06:37-521396 DEBUG    Read: file="html/upscalers.json" json=4 bytes=2672 time=0.000
17:06:37-523130 DEBUG    Read: file="extensions-builtin\sd-extension-chainner\models.json" json=24 bytes=2719 time=0.000
17:06:37-525100 DEBUG    chaiNNer models: path="models\chaiNNer" defined=24 discovered=0 downloaded=8
17:06:37-527086 DEBUG    Upscaler type=ESRGAN folder="models\ESRGAN" model="1x-ITF-SkinDiffDetail-Lite-v1"
                         path="models\ESRGAN\1x-ITF-SkinDiffDetail-Lite-v1.pth"
17:06:37-528573 DEBUG    Upscaler type=ESRGAN folder="models\ESRGAN" model="4xNMKDSuperscale_4xNMKDSuperscale"
                         path="models\ESRGAN\4xNMKDSuperscale_4xNMKDSuperscale.pth"
17:06:37-529565 DEBUG    Upscaler type=ESRGAN folder="models\ESRGAN" model="4x_NMKD-Siax_200k"
                         path="models\ESRGAN\4x_NMKD-Siax_200k.pth"
17:06:37-532540 INFO     Available Upscalers: items=56 downloaded=11 user=3 time=0.02 types=['None', 'Lanczos',
                         'Nearest', 'ChaiNNer', 'AuraSR', 'ESRGAN', 'LDSR', 'RealESRGAN', 'SCUNet', 'SD', 'SwinIR']
17:06:37-549648 INFO     Available Styles: folder="models\styles" items=288 time=0.02
17:06:37-552130 DEBUG    Creating UI
17:06:37-553122 DEBUG    UI themes available: type=Standard themes=12
17:06:37-554610 INFO     UI theme: type=Standard name="black-teal"
17:06:37-562078 DEBUG    UI theme: css="C:\ai\automatic\javascript\black-teal.css" base="sdnext.css" user="None"
17:06:37-564530 DEBUG    UI initialize: txt2img
17:06:37-638930 DEBUG    Networks: page='model' items=66 subfolders=2 tab=txt2img folders=['models\\Stable-diffusion',
                         'models\\Diffusers', 'models\\Reference'] list=0.05 thumb=0.01 desc=0.01 info=0.00 workers=4
                         sort=Default
17:06:37-651826 DEBUG    Networks: page='lora' items=125 subfolders=0 tab=txt2img folders=['models\\Lora',
                         'models\\LyCORIS'] list=0.06 thumb=0.01 desc=0.03 info=0.01 workers=4 sort=Default
17:06:37-684002 DEBUG    Networks: page='style' items=288 subfolders=1 tab=txt2img folders=['models\\styles', 'html']
                         list=0.05 thumb=0.00 desc=0.00 info=0.00 workers=4 sort=Default
17:06:37-689954 DEBUG    Networks: page='embedding' items=13 subfolders=0 tab=txt2img folders=['models\\embeddings']
                         list=0.03 thumb=0.01 desc=0.00 info=0.00 workers=4 sort=Default
17:06:37-691937 DEBUG    Networks: page='vae' items=0 subfolders=0 tab=txt2img folders=['models\\VAE'] list=0.00
                         thumb=0.00 desc=0.00 info=0.00 workers=4 sort=Default
17:06:37-797089 DEBUG    UI initialize: img2img
17:06:38-091712 DEBUG    UI initialize: control models=models\control
17:06:38-395760 DEBUG    Read: file="ui-config.json" json=0 bytes=2 time=0.000
17:06:38-505103 DEBUG    UI themes available: type=Standard themes=12
17:06:39-075007 DEBUG    Reading failed: C:\ai\automatic\html\extensions.json [Errno 2] No such file or directory:
                         'C:\\ai\\automatic\\html\\extensions.json'
17:06:39-075999 INFO     Extension list is empty: refresh required
17:06:39-691312 DEBUG    Extension list: processed=8 installed=8 enabled=6 disabled=2 visible=8 hidden=0
17:06:40-051905 DEBUG    Root paths: ['C:\\ai\\automatic']
17:06:40-137579 INFO     Local URL: http://127.0.0.1:7860/
17:06:40-138571 DEBUG    Gradio functions: registered=2449
17:06:40-140555 DEBUG    FastAPI middleware: ['Middleware', 'Middleware']
17:06:40-143531 DEBUG    Creating API
17:06:40-318619 INFO     [AgentScheduler] Task queue is empty
17:06:40-319840 INFO     [AgentScheduler] Registering APIs
17:06:40-445532 DEBUG    Scripts setup: ['IP Adapters:0.031', 'XYZ Grid:0.016', 'Face:0.016', 'AnimateDiff:0.006',
                         'CogVideoX:0.008', 'LUT Color grading:0.006', 'X/Y/Z Grid:0.012', 'Image-to-Video:0.006',
                         'Stable Video Diffusion:0.005']
17:06:40-448635 DEBUG    Model metadata: file="metadata.json" no changes
17:06:40-449627 DEBUG    Torch mode: deterministic=False
17:06:40-481858 INFO     Torch override VAE dtype: no-half set
17:06:40-483346 DEBUG    Desired Torch parameters: dtype=BF16 no-half=False no-half-vae=True upscast=False
17:06:40-484337 INFO     Setting Torch parameters: device=cuda dtype=torch.bfloat16 vae=torch.float32
                         unet=torch.bfloat16 context=no_grad fp16=True bf16=True optimization=Scaled-Dot-Product
17:06:40-486351 DEBUG    Model requested: fn=<lambda>
17:06:40-487314 INFO     Load model: select="Diffusers\sayakpaul/flux.1-dev-nf4 [b054bc66ae]"
17:06:40-488306 DEBUG    Load model: existing=False
                         target=models\Diffusers\models--sayakpaul--flux.1-dev-nf4\snapshots\b054bc66ae1097b811848c3739e
                         cd673a864bda1 info=None
17:06:40-489794 DEBUG    Load model:
                         path="models\Diffusers\models--sayakpaul--flux.1-dev-nf4\snapshots\b054bc66ae1097b811848c3739ec
                         d673a864bda1"
17:06:40-490786 INFO     Autodetect model: detect="FLUX" class=FluxPipeline
                         file="models\Diffusers\models--sayakpaul--flux.1-dev-nf4\snapshots\b054bc66ae1097b811848c3739ec
                         d673a864bda1" size=0MB
17:06:40-493266 DEBUG    Load model: type=FLUX model="Diffusers\sayakpaul/flux.1-dev-nf4"
                         repo="sayakpaul/flux.1-dev-nf4" unet="None" t5="None" vae="None" quant=nf4 offload=model
                         dtype=torch.bfloat16
17:06:40-494258 DEBUG    HF login: no token provided
17:06:43-744194 DEBUG    GC: utilization={'gpu': 34, 'ram': 11, 'threshold': 80} gc={'collected': 137, 'saved': 0.0}
                         before={'gpu': 8.12, 'ram': 7.22} after={'gpu': 8.12, 'ram': 7.22, 'retries': 0, 'oom': 0}
                         device=cuda fn=load_flux_nf4 time=0.21
17:06:44-252097 DEBUG    Load model: type=FLUX preloaded=['transformer']
Diffusers  4.55it/s █████████████ 100% 2/2 00:00 00:00 Loading checkpoint shards
Diffusers  6.86it/s ████████ 100% 7/7 00:01 00:00 Loading pipeline components...
17:06:46-578391 INFO     Load network: type=embeddings loaded=0 skipped=13 time=0.01
17:06:46-714790 DEBUG    Setting model: component=VAE no-half=True
17:06:46-715783 DEBUG    Setting model: component=VAE slicing=True
17:06:46-717270 DEBUG    Setting model: component=VAE tiling=True
17:06:46-717766 DEBUG    Setting model: attention="Scaled-Dot-Product"
17:06:46-735127 DEBUG    Setting model: offload=model
17:06:48-931284 DEBUG    GC: utilization={'gpu': 7, 'ram': 12, 'threshold': 80} gc={'collected': 231, 'saved': 0.0}
                         before={'gpu': 1.64, 'ram': 7.59} after={'gpu': 1.64, 'ram': 7.59, 'retries': 0, 'oom': 0}
                         device=cuda fn=load_diffuser time=0.22
17:06:48-933800 INFO     Load model: time=8.22 load=6.07 options=0.16 move=1.98 native=1024 memory={'ram': {'used':
                         7.59, 'total': 63.92}, 'gpu': {'used': 1.64, 'total': 23.99}, 'retries': 0, 'oom': 0}
17:06:48-936279 DEBUG    Script callback init time: image_browser.py:ui_tabs=0.42 system-info.py:app_started=0.07
                         task_scheduler.py:app_started=0.14
17:06:48-937767 INFO     Startup time: 19.04 torch=3.74 accelerate=0.05 gradio=0.98 diffusers=0.31 libraries=1.61
                         extensions=0.63 face-restore=0.20 ui-networks=0.24 ui-txt2img=0.08 ui-img2img=0.26
                         ui-control=0.14 ui-settings=0.25 ui-extensions=1.08 ui-defaults=0.29 launch=0.15 api=0.10
                         app-started=0.21 checkpoint=8.49
17:06:48-939254 DEBUG    Save: file="config.json" json=38 bytes=1540 time=0.003
17:06:48-942231 DEBUG    Unused settings: ['cross_attention_options']
17:07:14-332659 INFO     MOTD: N/A
17:07:18-112156 DEBUG    UI themes available: type=Standard themes=12
17:07:18-392344 INFO     Browser session: user=None client=127.0.0.1 agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64)
                         AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36 Edg/128.0.0.0
17:07:40-937688 DEBUG    Pipeline class change: original=FluxPipeline target=FluxImg2ImgPipeline device=cpu fn=init
17:07:40-963479 DEBUG    Image resize: input=<PIL.Image.Image image mode=RGB size=1536x1536 at 0x162736956D0> width=1536
                         height=1536 mode="Fixed" upscaler="None" context="Add with forward" type=image
                         result=<PIL.Image.Image image mode=RGB size=1536x1536 at 0x16273814E50> time=0.00
                         fn=C:\ai\automatic\modules\processing_class.py:init
17:07:41-211497 INFO     Base: class=FluxImg2ImgPipeline
17:07:41-212985 DEBUG    Sampler default FlowMatchEulerDiscreteScheduler: {'num_train_timesteps': 1000, 'shift': 3.0,
                         'use_dynamic_shifting': True, 'base_shift': 0.5, 'max_shift': 1.15, 'base_image_seq_len': 256,
                         'max_image_seq_len': 4096}
17:07:41-232330 DEBUG    Torch generator: device=cuda seeds=[1280845479]
17:07:41-236794 DEBUG    Diffuser pipeline: FluxImg2ImgPipeline task=DiffusersTaskType.IMAGE_2_IMAGE batch=1/1x1
                         set={'prompt': 1, 'guidance_scale': 6, 'num_inference_steps': 41, 'output_type': 'latent',
                         'image': [<PIL.Image.Image image mode=RGB size=1536x1536 at 0x1625B3C4410>], 'strength': 0.5,
                         'width': 1536, 'height': 1536, 'parser': 'Fixed attention'}
17:07:45-760315 ERROR    Processing: args={'prompt': ['woman'], 'guidance_scale': 6, 'generator': [<torch._C.Generator
                         object at 0x000001625AF00A90>], 'callback_on_step_end': <function diffusers_callback at
                         0x000001626DB880E0>, 'callback_on_step_end_tensor_inputs': ['latents'], 'num_inference_steps':
                         41, 'output_type': 'latent', 'image': [<PIL.Image.Image image mode=RGB size=1536x1536 at
                         0x1625B3C4410>], 'strength': 0.5, 'width': 1536, 'height': 1536} Input type (struct
                         c10::BFloat16) and bias type (float) should be the same
17:07:45-763291 ERROR    Processing: RuntimeError
╭───────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────╮
│ C:\ai\automatic\modules\processing_diffusers.py:94 in process_base                                                   │
│                                                                                                                      │
│    93 │   │   else:                                                                                                  │
│ ❱  94 │   │   │   output = shared.sd_model(**base_args)                                                              │
│    95 │   │   if isinstance(output, dict):                                                                           │
│                                                                                                                      │
│ C:\ai\automatic\venv\Lib\site-packages\torch\utils\_contextlib.py:116 in decorate_context                            │
│                                                                                                                      │
│   115 │   │   with ctx_factory():                                                                                    │
│ ❱ 116 │   │   │   return func(*args, **kwargs)                                                                       │
│   117                                                                                                                │
│                                                                                                                      │
│ C:\ai\automatic\venv\Lib\site-packages\diffusers\pipelines\flux\pipeline_flux_img2img.py:761 in __call__             │
│                                                                                                                      │
│   760 │   │                                                                                                          │
│ ❱ 761 │   │   latents, latent_image_ids = self.prepare_latents(                                                      │
│   762 │   │   │   init_image,                                                                                        │
│                                                                                                                      │
│ C:\ai\automatic\venv\Lib\site-packages\diffusers\pipelines\flux\pipeline_flux_img2img.py:539 in prepare_latents      │
│                                                                                                                      │
│   538 │   │   image = image.to(device=device, dtype=dtype)                                                           │
│ ❱ 539 │   │   image_latents = self._encode_vae_image(image=image, generator=generator)                               │
│   540 │   │   if batch_size > image_latents.shape[0] and batch_size % image_latents.shape[0] == 0:                   │
│                                                                                                                      │
│ C:\ai\automatic\venv\Lib\site-packages\diffusers\pipelines\flux\pipeline_flux_img2img.py:395 in _encode_vae_image    │
│                                                                                                                      │
│   394 │   │   if isinstance(generator, list):                                                                        │
│ ❱ 395 │   │   │   image_latents = [                                                                                  │
│   396 │   │   │   │   retrieve_latents(self.vae.encode(image[i : i + 1]), generator=generator[i])                    │
│                                                                                                                      │
│                                               ... 6 frames hidden ...                                                │
│                                                                                                                      │
│ C:\ai\automatic\venv\Lib\site-packages\diffusers\models\autoencoders\vae.py:143 in forward                           │
│                                                                                                                      │
│   142 │   │                                                                                                          │
│ ❱ 143 │   │   sample = self.conv_in(sample)                                                                          │
│   144                                                                                                                │
│                                                                                                                      │
│ C:\ai\automatic\venv\Lib\site-packages\torch\nn\modules\module.py:1553 in _wrapped_call_impl                         │
│                                                                                                                      │
│   1552 │   │   else:                                                                                                 │
│ ❱ 1553 │   │   │   return self._call_impl(*args, **kwargs)                                                           │
│   1554                                                                                                               │
│                                                                                                                      │
│ C:\ai\automatic\venv\Lib\site-packages\torch\nn\modules\module.py:1562 in _call_impl                                 │
│                                                                                                                      │
│   1561 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                                       │
│ ❱ 1562 │   │   │   return forward_call(*args, **kwargs)                                                              │
│   1563                                                                                                               │
│                                                                                                                      │
│ C:\ai\automatic\venv\Lib\site-packages\torch\nn\modules\conv.py:458 in forward                                       │
│                                                                                                                      │
│    457 │   def forward(self, input: Tensor) -> Tensor:                                                               │
│ ❱  458 │   │   return self._conv_forward(input, self.weight, self.bias)                                              │
│    459                                                                                                               │
│                                                                                                                      │
│ C:\ai\automatic\venv\Lib\site-packages\torch\nn\modules\conv.py:454 in _conv_forward                                 │
│                                                                                                                      │
│    453 │   │   │   │   │   │   │   _pair(0), self.dilation, self.groups)                                             │
│ ❱  454 │   │   return F.conv2d(input, weight, bias, self.stride,                                                     │
│    455 │   │   │   │   │   │   self.padding, self.dilation, self.groups)                                             │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Input type (struct c10::BFloat16) and bias type (float) should be the same
17:07:46-126380 DEBUG    Pipeline class change: original=FluxImg2ImgPipeline target=FluxPipeline device=cuda:0
                         fn=process_images_inner
17:07:46-137292 INFO     Processed: images=0 its=0.00 time=5.19 timers={'init': 0.03, 'prepare': 0.25, 'args': 0.02,
                         'process': 4.88} memory={'ram': {'used': 8.14, 'total': 63.92}, 'gpu': {'used': 10.9, 'total':
                         23.99}, 'retries': 0, 'oom': 0}
17:08:00-050721 DEBUG    Server: alive=True jobs=1 requests=41 uptime=84 memory=8.14/63.92 backend=Backend.DIFFUSERS
                         state=idle

Backend

Diffusers

UI

Standard

Branch

Dev

Model

Other

Acknowledgements

vladmandic commented 2 weeks ago

if you're using bf16, do not use no-half vae. keeping issue open since it does need auto-handling in the code.

update: fixed in latest dev. still, don't use no-half with bf16. no-half is intended only to fix fp16. if you can use bf16, do so - its always preferred and do it natively without any upcasting.

SAC020 commented 2 weeks ago

no-half is intended only to fix fp16. if you can use bf16, do so - its always preferred and do it natively without any upcasting.

Thank you.

What do you mean by "do it natively without any upcasting"? Which settings to use / not use?

Is bf16 preferrable for SDXL as well, or just Flux? And why is it preferable? (speed, quality, VRAM...?)

Flux doesn't seem to work with fp16 at all, it throws errors similar to the above.

vladmandic commented 2 weeks ago

bf16 is preferable over fp16 nearly always, not just flux, only reason why it's not default for everyone is that it's only supported on rtx3000 and newer GPUs. it doesn't exist in older GPUs or GPUs from other vendors.

why, it's a longer story. fp16 has a chance to overflow on math operations, thus upcast (and no-half) options exist, so those operations where chance of overflow are higher, they are executed in fp32 instead - at the cost of double the memory and half the speed.

bf16 pretty much eliminates risk of overflows at the price of tiny bit smaller precision.

none of this is flux specific.

SAC020 commented 2 weeks ago

Thank you for the explanation

So these settings should be ok / preferable (provided rtx3000+):

image