[Issue]: processing.py:732: RuntimeWarning: invalid value encountered in cast x_sample = x_sample.astype(np.uint8)

StratholmeBurns commented 10 months ago

Issue Description

When running any prompt with any model and using the --safe flag at some point the generation breaks and just displays a black image. It will run for a few steps then either turn the preview black and the final image is just black. When restarting the server it can work again for 1 or 2 generations then it breaks again.

Error is always : processing.py:732: RuntimeWarning: invalid value encountered in cast x_sample = x_sample.astype(np.uint8)

fiddling with settings, didnt help. no half, fp16 or 32, tried most of em. same with the different options like Use fixed UNet precision, Disable NaN check in produced images/latent spaces, Attempt VAE roll back when produced NaN values (experimental) etc.

Version Platform Description

Win10, Firefox, AMD Radeon RX 7900 XT, AMD Ryzen 7 5800X 8-Core Processor

Startup Log

17:37:25-257715 INFO     Python 3.10.6 on Windows
17:37:25-328206 INFO     Version: 79c01311 Tue Aug 15 12:25:08 2023 +0000
17:37:25-525229 DEBUG    Setting environment tuning
17:37:25-526730 DEBUG    Torch overrides: cuda=False rocm=False ipex=False diml=True
17:37:25-528230 DEBUG    Torch allowed: cuda=False rocm=False ipex=False diml=True
17:37:25-529730 INFO     Using DirectML Backend
17:37:25-654713 INFO     Verifying requirements
17:37:25-671216 INFO     Verifying packages
17:37:25-673717 INFO     Verifying repositories
17:37:25-738230 DEBUG    Submodule: C:\Users\xxxxxxx\Desktop\AI
                         Gen\stable-diffusion-webui-automatic\repositories\stable-diffusion-stability-ai / main
17:37:26-405733 DEBUG    Submodule: C:\Users\xxxxxxx\Desktop\AI
                         Gen\stable-diffusion-webui-automatic\repositories\taming-transformers / master
17:37:27-986710 DEBUG    Submodule: C:\Users\xxxxxxx\Desktop\AI Gen\stable-diffusion-webui-automatic\repositories\BLIP /
                         main
17:37:28-499235 INFO     Verifying submodules
17:37:30-713740 DEBUG    Submodule: extensions-builtin/a1111-sd-webui-lycoris / main
17:37:31-262745 DEBUG    Submodule: extensions-builtin/clip-interrogator-ext / main
17:37:31-847094 DEBUG    Submodule: extensions-builtin/multidiffusion-upscaler-for-automatic1111 / main
17:37:32-409246 DEBUG    Submodule: extensions-builtin/sd-dynamic-thresholding / master
17:37:32-982983 DEBUG    Submodule: extensions-builtin/sd-extension-system-info / main
17:37:33-548246 DEBUG    Submodule: extensions-builtin/sd-webui-agent-scheduler / main
17:37:34-129749 DEBUG    Submodule: extensions-builtin/sd-webui-controlnet / main
17:37:34-724246 DEBUG    Submodule: extensions-builtin/stable-diffusion-webui-images-browser / main
17:37:35-313182 DEBUG    Submodule: extensions-builtin/stable-diffusion-webui-rembg / master
17:37:35-892305 DEBUG    Submodule: modules/lora / main
17:37:36-496238 DEBUG    Submodule: modules/lycoris / main
17:37:37-052750 DEBUG    Submodule: wiki / master
17:37:37-714252 DEBUG    Installed packages: 208
17:37:37-715752 DEBUG    Extensions all: ['LDSR', 'Lora', 'ScuNET', 'sd-extension-system-info',
                         'stable-diffusion-webui-images-browser', 'stable-diffusion-webui-rembg', 'SwinIR']
17:37:39-531262 DEBUG    Submodule: C:\Users\xxxxxxx\Desktop\AI
                         Gen\stable-diffusion-webui-automatic\extensions-builtin\sd-extension-system-info / main
17:37:40-067736 DEBUG    Running extension installer: C:\Users\xxxxxxx\Desktop\AI
                         Gen\stable-diffusion-webui-automatic\extensions-builtin\sd-extension-system-info\install.py
17:37:40-541318 DEBUG    Submodule: C:\Users\xxxxxxx\Desktop\AI
                         Gen\stable-diffusion-webui-automatic\extensions-builtin\stable-diffusion-webui-images-browser
                         / main
17:37:41-017736 DEBUG    Running extension installer: C:\Users\xxxxxxx\Desktop\AI
                         Gen\stable-diffusion-webui-automatic\extensions-builtin\stable-diffusion-webui-images-browser\
                         install.py
17:37:41-489318 DEBUG    Submodule: C:\Users\ARCHER\xxxxxxx\AI
                         Gen\stable-diffusion-webui-automatic\extensions-builtin\stable-diffusion-webui-rembg / master
17:37:41-994236 DEBUG    Running extension installer: C:\Users\xxxxxxx\Desktop\AI
                         Gen\stable-diffusion-webui-automatic\extensions-builtin\stable-diffusion-webui-rembg\install.p
                         y
17:37:43-009209 INFO     Extensions enabled: ['LDSR', 'Lora', 'ScuNET', 'sd-extension-system-info',
                         'stable-diffusion-webui-images-browser', 'stable-diffusion-webui-rembg', 'SwinIR']
17:37:43-012208 INFO     Verifying packages
17:37:43-013709 INFO     Updating Wiki
17:37:43-073719 DEBUG    Submodule: C:\Users\xxxxxxx\Desktop\AI Gen\stable-diffusion-webui-automatic\wiki / master
17:37:43-586736 DEBUG    Setup complete without errors: 1692113864
17:37:43-588237 INFO     Running in safe mode without user extensions
17:37:43-601751 INFO     Extension preload: 0.0s C:\Users\xxxxxxx\Desktop\AI
                         Gen\stable-diffusion-webui-automatic\extensions-builtin
17:37:43-616686 DEBUG    Memory used: 0.04 total: 15.92 Collected 0
17:37:43-618186 DEBUG    Starting module: <module 'webui' from 'C:\\Users\\xxxxxxx\\Desktop\\AI
                         Gen\\stable-diffusion-webui-automatic\\webui.py'>
17:37:43-619687 INFO     Server arguments: ['--use-directml', '--debug', '--upgrade', '--safe']
17:37:43-638694 DEBUG    Loading Torch
17:37:49-586722 DEBUG    Loading Gradio
17:37:50-334888 DEBUG    Loading Modules
No module 'xformers'. Proceeding without it.
17:37:51-640129 DEBUG    Reading: C:\Users\xxxxxxx\Desktop\AI Gen\stable-diffusion-webui-automatic\config.json len=305
17:37:51-642130 INFO     Pipeline: Backend.ORIGINAL
17:37:51-644130 DEBUG    Loaded styles: C:\Users\xxxxxxx\Desktop\AI Gen\stable-diffusion-webui-automatic\styles.csv 0
17:37:52-421664 INFO     Libraries loaded
17:37:52-423164 DEBUG    Entering start sequence
17:37:52-521182 DEBUG    Version: {'app': 'sd.next', 'updated': '2023-08-15', 'hash': '79c01311', 'url':
                         'https://github.com/vladmandic/automatic/tree/master'}
17:37:52-524682 INFO     Using data path: C:\Users\xxxxxxx\Desktop\AI Gen\stable-diffusion-webui-automatic
17:37:52-526182 DEBUG    Event loop: <_WindowsSelectorEventLoop running=False closed=False debug=False>
17:37:52-528182 DEBUG    Entering initialize
17:37:52-529183 ERROR    Rollback VAE functionality requires compatible GPU
17:37:52-529683 DEBUG    Available samplers: ['UniPC', 'DDIM', 'PLMS', 'Euler a', 'Euler', 'DPM++ 2S a', 'DPM++ 2S a
                         Karras', 'DPM++ 2M', 'DPM++ 2M Karras', 'DPM++ SDE', 'DPM++ SDE Karras', 'DPM++ 2M SDE',
                         'DPM++ 2M SDE Karras', 'DPM++ 3M SDE', 'DPM++ 3M SDE Karras', 'DPM fast', 'DPM adaptive',
                         'DPM2', 'DPM2 Karras', 'DPM2 a', 'DPM2 a Karras', 'LMS', 'LMS Karras', 'Heun']
17:37:52-536185 INFO     Available VAEs: C:\Users\xxxxxxx\Desktop\AI Gen\Models\VAE 4
17:37:52-542185 DEBUG    Reading: C:\Users\xxxxxxx\Desktop\AI Gen\stable-diffusion-webui-automatic\cache.json len=2
17:37:52-549186 DEBUG    Reading: C:\Users\xxxxxxx\Desktop\AI Gen\stable-diffusion-webui-automatic\metadata.json len=23
17:37:52-557990 INFO     Available models: C:\Users\xxxxxxx\Desktop\AI Gen\Models\Stable-diffusion 15
17:37:52-604588 DEBUG    Loading scripts
17:37:54-998232 DEBUG    Scripts load: ['LDSR:1.184s', 'Lora:0.189s', 'sd-extension-system-info:0.054s',
                         'stable-diffusion-webui-images-browser:0.072s', 'stable-diffusion-webui-rembg:0.825s']
17:37:55-114729 INFO     Loading UI theme: name=amethyst-nightfall style=Auto
17:37:55-119231 DEBUG    Creating UI
17:37:55-125790 DEBUG    Reading: C:\Users\xxxxxxx\Desktop\AI Gen\stable-diffusion-webui-automatic\ui-config.json len=37
17:37:55-140228 DEBUG    Extra networks: checkpoints items=15 subdirs=0
17:37:55-149372 DEBUG    Extra networks: lora items=8 subdirs=0
17:37:55-153374 DEBUG    Extra networks: textual inversion items=4 subdirs=0
17:37:55-160875 DEBUG    UI interface: tab=txt2img batch=False seed=True advanced=True second_pass=False
17:37:55-198882 DEBUG    UI interface: tab=img2img seed=True resize=False batch=False denoise=True advanced=True
17:37:55-258892 DEBUG    Reading: C:\Users\xxxxxxx\Desktop\AI Gen\stable-diffusion-webui-automatic\ui-config.json len=37
17:37:56-020024 DEBUG    Script: 0.7s ui_tabs C:\Users\xxxxxxx\Desktop\AI
                         Gen\stable-diffusion-webui-automatic\extensions-builtin\stable-diffusion-webui-images-browser\
                         scripts\image_browser.py
17:37:56-032228 DEBUG    Extensions list loaded: C:\Users\xxxxxxx\Desktop\AI
                         Gen\stable-diffusion-webui-automatic\html\extensions.json
17:37:56-926884 DEBUG    Extension list refresh: processed=210 installed=13 enabled=7 disabled=6 visible=210 hidden=0
Running on local URL:  http://127.0.0.1:7860
17:37:57-119236 INFO     Local URL: http://127.0.0.1:7860/
17:37:57-120737 DEBUG    Gradio registered functions: 1387
17:37:57-121737 INFO     Initializing middleware
17:37:57-125127 DEBUG    Creating API
17:37:57-184637 DEBUG    Scripts setup: ['Alternative:0.008s']
17:37:57-186138 DEBUG    Scripts components: []
17:37:57-187138 DEBUG    Model metadata: C:\Users\xxxxxxx\Desktop\AI Gen\stable-diffusion-webui-automatic\metadata.json
                         no changes
17:37:57-188638 DEBUG    Model auto load disabled
17:37:57-189638 INFO     Startup time: 13.6s (torch=5.9s gradio=0.7s libraries=2.1s samplers=0.1s scripts=2.4s
                         onchange=0.1s ui-txt2img=0.1s ui-settings=0.1s ui-extensions=1.6s launch=0.1s
                         app-started=0.1s)

Relevant log output

17:44:08-155772 DEBUG    img2img: id_task=task(iadgqxhzrw7yufe)|mode=0|prompt=<lora:more_details:1>,((solo,1 woman,
                         portrait,centered,muscles,big muscles,muscle veins,short hair,anime styled hair,fit woman,gold
                         clothing,gold reflective color,gold hair,shiny hair,hair in eye,strong,determined,strong
                         chin,angular chin,very big boobs,big boobs,big titties,smiling deviously,looking at
                         viewer)),(erotic,red velvet background,left hand on hip),
                         ((high quality:1.2, masterpiece:1.2)), absurdres, high resolution, (8k resolution), 8k, 8kres,
                         8k res,
                         high details, detailed and intricate, intricate details, high intricate details, absurd amount
                         of details, super resolution, ultra hd, megapixel|negative_prompt=easynegative,
                         ng_deepnegative_v1_75t, Asian-Less-Neg, By bad artist -neg,
                         bad_prompt_version2-neg|prompt_styles=[]|init_img=<PIL.Image.Image image mode=RGBA
                         size=1024x1344 at
                         0x1F497050730>|sketch=None|init_img_with_mask=None|inpaint_color_sketch=None|inpaint_color_ske
                         tch_orig=None|init_img_inpaint=None|init_mask_inpaint=None|steps=65|sampler_index=1|latent_ind
                         ex=None|mask_blur=4|mask_alpha=0|inpainting_fill=1|full_quality=True|restore_faces=False|tilin
                         g=False|n_iter=1|batch_size=1|cfg_scale=12|image_cfg_scale=1.5|clip_skip=1|denoising_strength=
                         0.35|seed=-1.0|subseed-1.0|subseed_strength=0|seed_resize_from_h=0|seed_resize_from_w=0|select
                         ed_scale_tab=0|height=512|width=512|scale_by=1.5|resize_mode=3|inpaint_full_res=1|inpaint_full
                         _res_padding=32|inpainting_mask_invert=0|img2img_batch_files=None|img2img_batch_input_dir=|img
                         2img_batch_output_dir=|img2img_batch_inpaint_mask_dir=|override_settings_texts=[]|args=(0,
                         '<ul>\n<li><code>CFG Scale</code> should be 2 or lower.</li>\n</ul>\n', True, True, '', '',
                         True, 50, True, 1, 0, False, 4, 0.5, 'Linear', 'None', '<p
                         style="margin-bottom:0.75em">Recommended settings: Sampling Steps: 80-100, Sampler: Euler a,
                         Denoising strength: 0.8</p>', 128, 8, ['left', 'right', 'up', 'down'], 1, 0.05, 128, 4, 0,
                         ['left', 'right', 'up', 'down'], False, False, 'positive', 'comma', 0, False, False, '', '<p
                         style="margin-bottom:0.75em">Will upscale the image by the selected scale factor; use width
                         and height sliders to set tile size</p>', 64, 0, 2, 0, '', [], 0, '', [], 0, '', [], True,
                         False, False, False, 0, False)
17:44:08-202780 DEBUG    Script process: []
17:44:08-204781 DEBUG    Sampler: DDIM {'default_eta_is_0': True, 'uses_ensd': True}
17:44:08-276793 DEBUG    Script before-process-batch: []
17:44:08-281793 DEBUG    Script process-batch: []
Running DDIM Sampling with 22 timesteps
Decoding image: 100%|██████████████████████████████████████████████████████████████████| 22/22 [00:20<00:00,  1.09it/s]
17:44:29-076924 DEBUG    Script postprocess-batch: []
17:44:29-080424 DEBUG    Script postprocess-batch-list: []
C:\Users\xxxxxx\Desktop\AI Gen\stable-diffusion-webui-automatic\modules\processing.py:732: RuntimeWarning: invalid value encountered in cast
  x_sample = x_sample.astype(np.uint8)
17:44:29-086925 DEBUG    Script postprocess-image: []
17:44:29-090426 DEBUG    Saving image: PNG C:\Users\xxxxxx\Desktop\AI
                         Gen\stable-diffusion-webui-automatic\outputs/image\00071-lora more details 1 solo 1 woman
                         portrait.png (512, 512)
17:44:29-102928 DEBUG    Script postprocess: []
17:44:29-105929 DEBUG    Processed: 1 Memory: {'ram': {'used': 6.53, 'total': 15.92}, 'gpu': {'used': 18.48, 'total':
                         20.28}, 'retries': 'DirectMLDevice', 'oom': 0} img

Acknowledgements

[X] I have read the above and searched for existing issues
[X] I confirm that this is classified correctly and its not an extension or diffusers-specific issue

vladmandic commented 10 months ago

i've never seen this before and cannot reproduce (not to mention that's the function that executes on each and every image ever generated, so its near-impossible that it never occurred so far).

and you haven't provided platform information?

StratholmeBurns commented 10 months ago

Sorry, forgot. Here :Win10, Firefox, AMD Radeon RX 7900 XT, AMD Ryzen 7 5800X 8-Core Processor

If it helps, I was succesfully using the automatic111 repo on a nvidia 1070. Started using your repo after i switched to amd. Had similiar problems with inpainting just turning to grey pictures as well, which i just bruteforce fixxed by reinstalling 2 or 3 times. Hasnt worked for this issue yet sadly x(

If you never seen this issue, that is quite worrying im afraid, i do wonder where i fucked up then. Cause sometimes i can generate 2-4 images and then it randomly breaks at 20,40 or 90%, pretty much randomly. And the moment it does, all generations dont work anymore till i restart.

Accy587 commented 10 months ago

I have basically the same issue in processing.py:733 with same exact error where if a train a new lora with Kohya_ss on my Nvidia 3060 12gb, windows 10, on Opera Gx, Intel i7-11700 all of my images are just black with no change.

vladmandic commented 10 months ago

@StratholmeBurns the fact that it works, but breaks after couple of generations points that it could be that you did nothing wrong. amd on windows is a painful combo - rocm was historically just for linux, so for windows users are forced to rely on directml which is 3x slower (at best) and not well maintained (last update was in april).

amd finally released rocm for windows recently, so i'm hoping we'll soon have torch-rocm and actual native support for amd gpus.

@Accy587 maybe in the same place, but very different reason. something in the lora is causing pixel number value during generate to be a non-number - likely division by zero or overflow - basically its bad values in lora itself. try changing alpha or some other related parameters to adjust overal normalization inside lora.

Accy587 commented 10 months ago

I dont understand how it could be lora since i am using stable-diffusion-v1-5 as my base.

vladmandic commented 10 months ago

I dont understand how it could be lora since i am using stable-diffusion-v1-5 as my base.

doesn't matter what model you're using as base, lora can still have invalid values if its trained with bad params.

flowerdealer commented 10 months ago

I'm also having this issue with SDXL 1.0, no Lora, on a 4090.

vladmandic commented 10 months ago

@flowerdealer which issue? I've already noted two different possible root causes with totally different paths. Please provide some additional information, just stating "me too" almost never helps.

flowerdealer commented 10 months ago

Getting: Error is always : processing.py:732: RuntimeWarning: invalid value encountered in cast x_sample = x_sample.astype(np.uint8) and black images too.

Pilat66 commented 10 months ago

My problem disappeared after setting the options: _If you intend to use the Diffusers backend and/or SDXL, go to Setup/User Interface and select sd_model_refiner, diffusers_pipeline, and sdbackend to the Quicksettings, this will make controlling SDXL much easier.

StratholmeBurns commented 10 months ago

So, after playing around a bit, i can get it to work with all the bells and whistles with the following settings:

precision type - full
fp32
-no-half & --no-half-vae
use fixed unet precision
disable nan check
Euler a sampler

i did not do anything to the backend settings as pilat66 wrote. I can use extensions without problems with these settings. Tested with controlnet and afterdetailer

It seems the main cause of the issue with images breaking at some point is the sampler selected. With DDIM, UniPC, PLMS and all 4 karras the image breaks at some point. some sooner then later. Best performance by a mile is DDIM, so its sad having to switch back to euler. But i managed to be able to consistently generate images again without having to restart the server/pc every 10 minutes.

vladmandic commented 10 months ago

that indicates that issue is exactly as expected - its a math rounding issue resulting in invalid pixel values.

btw, i've added addtional exception handler that will try to replace invalid values with dead pixels (note that is not a solution, simply a stop-gap to prevent runtime errors).

vladmandic commented 10 months ago

anyone had a chance to update & check?

StratholmeBurns commented 10 months ago

Tried it, error happened on the very first generation :D

C:\Users\xxxxxxxx\Desktop\AI Gen\stable-diffusion-webui-automatic\modules\processing.py:584: RuntimeWarning: invalid value encountered in cast
  sample = sample.astype(np.uint8)

same settings still as i have posted earlier, nothing changed setup wise. always up to date. Your effort is appreciated btw, not sure you are getting the gratitude you deserve as a dev :)

vladmandic commented 10 months ago

thanks for the comments!

i've just pushed another update - failed to realize that numpy throws warning, not error.

StratholmeBurns commented 10 months ago

Nope, same still

C:\Users\ARCHER\Desktop\AI Gen\stable-diffusion-webui-automatic\modules\processing.py:584: RuntimeWarning: invalid value encountered in cast
  sample = sample.astype(np.uint8)

app: SD.next
updated: 2023-08-26
hash: 0f10a9ce
url: https://github.com/vladmandic/automatic/tree/master

vladmandic commented 10 months ago

OK, I'll have to rethink this a bit next week.

peleh commented 10 months ago

Enabling the option at Settings/Compute Settings: Enable model compile (experimental) is what made the diference here.

StratholmeBurns commented 9 months ago

Small update from my side:

Samplers that cause the error: UniPC, DDIM, PLMS, any DPM Karras sampler up to 2M Samplers that work without error: Euler a, DPM++ 3M SDE Karras, DPM fast, LMS Karras

So if anyone else is having these issues, use any of these samplers, and if it helps here are my settings with which i can generate without errors: asdasd

Doesnt really solve it, but hey, if it works it works :D

using DPM++ 3M SDE Karras atm with always 65+ steps and having fairly good results in terms of speed and quality.

SheMelody commented 4 months ago

I'm using DPM SDE Karras and currently having this very exact problem, it was perfectly fine before updating to the latest version. Unrelated, but I'm also unable to use xFormers in the latest version.

Edit: Seems like it's much more frequent when using Euler A / default. DPM SDE only breaks every 12-20 images for me, here's what broken images may look like (using RTX 40 series cards if that helps):

00306

00304

skylerblack2 commented 4 months ago

in case it's helpful, I can reproduce this issue reliably by: 1/ creating a controlnet and generating an image with a 1.5 model (Model A) 2/ changing to any other 1.5 model (Model B) 3/generating an image with Model B 4/ observe the black squares + log error sd_webui_forge\webui_forge_cu121_torch21\webui\modules\processing.py:968: RuntimeWarning: invalid value encountered in cast███████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00, 8.10it/s] x_sample = x_sample.astype(np.uint8)

the issue persists until I deactivate the controlnet, switch to any other 1.5 model (Model C), generate an image without controlnet, then return to Model B and set up the controlnet again. At that point I can run the exact same settings as step #3 above without issues.

Sometimes I get the black images issue randomly as well, but when I do I can always resolve it the same way.

This is also on Windows 11, latest sd web ui forge.

vladmandic commented 4 months ago

latest sd web ui forge.

This is a different app.

skylerblack2 commented 4 months ago

ah, my fault for getting confused. hopefully it's maybe still helpful knowledge.

redpintings commented 2 months ago

我在processing.py:733中遇到了基本相同的问题，并且出现了同样的错误，如果在我的Nvidia 3060 12gb、Windows 10、Opera Gx、Intel i7-11700上用Kohya_ss训练一个新的lora，我的所有图像都是黑色的不用找了。

这个问题你解决了吗

vladmandic commented 2 months ago

english only pls

SheMelody commented 2 months ago

I'm using DPM SDE Karras and currently having this very exact problem, it was perfectly fine before updating to the latest version. Unrelated, but I'm also unable to use xFormers in the latest version.

Edit: Seems like it's much more frequent when using Euler A / default. DPM SDE only breaks every 12-20 images for me, here's what broken images may look like (using RTX 40 series cards if that helps):

I don't know if this helps, but I can realiably reproduce this (like in the quoted post) every 12 to 20 images using a RTX 4070 SUPER and a RTX 4090, the first one is from GainWard and the second one is from ASUS ROG Strix. The GPUs are placed in different systems, one of each has a i7-14700K and the other has a i9-14900K. Hope this helps into diagnosing the problem. It's not a problem with the systems because I've been using ComfyUI on both of them and there has never been a glitch.

redpintings commented 2 months ago

只有英文请

Okay, I want to know if this issue has been resolved. I don't think it's a problem with your code. I've tried many SD UIs and this problem occurs. At first, it shows a black image after rendering the image. I added many parameters but it didn't work, so I plan to try a different system. Now it always prompts me:/modules/processing. py: 968: RuntimeWarning: invalid value encoded in cast X_sample=x_sample. style (np.uint8) and display black image

vladmandic / automatic