lshqqytiger / stable-diffusion-webui-amdgpu

Stable Diffusion web UI
GNU Affero General Public License v3.0
1.74k stars 178 forks source link

Could not allocate tensor with 377487360 bytes. There is not enough GPU video memory available! #38

Closed imamqaum1 closed 5 months ago

imamqaum1 commented 1 year ago

Is there an existing issue for this?

What happened?

Stable diffusion crash, after generating some pixel and appear error : Could not allocate tensor with 377487360 bytes. There is not enough GPU video memory available! Screenshot 2023-03-11 045325

Steps to reproduce the problem

  1. Go to Text2Img
  2. Insert prompt and negative promt
  3. Generating Screenshot 2023-03-11 045445

What should have happened?

Stable diffusion running normally, and generating some image

Commit where the problem happens

RuntimeError: Could not allocate tensor with 377487360 bytes. There is not enough GPU video memory available!

What platforms do you use to access the UI ?

Windows

What browsers do you use to access the UI ?

Microsoft Edge

Command Line Arguments

--lowvram --disable-nan-check --autolaunch --no-half

List of extensions

No

Console logs

venv "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Commit hash: ff558348682fea569785dcfae1f1282cfbefda6b
Installing requirements for Web UI
Launching Web UI with arguments: --lowvram --disable-nan-check --autolaunch --no-half
Warning: experimental graphic memory optimization is disabled due to gpu vendor. Currently this optimization is only available for AMDGPUs.
Disabled experimental graphic memory optimizations.
Interrogations are fallen back to cpu. This doesn't affect on image generation. But if you want to use interrogate (CLIP or DeepBooru), check out this issue: https://github.com/lshqqytiger/stable-diffusion-webui-directml/issues/10
Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled
No module 'xformers'. Proceeding without it.
Loading weights [bfcaf07557] from D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\models\Stable-diffusion\768-v-ema.ckpt
Creating model from config: D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\configs\stable-diffusion\v2-inference-v.yaml
LatentDiffusion: Running in v-prediction mode
DiffusionWrapper has 865.91 M params.
Applying cross attention optimization (InvokeAI).
Textual inversion embeddings loaded(0):
Model loaded in 235.4s (load weights from disk: 133.7s, find config: 48.2s, load config: 0.3s, create model: 3.4s, apply weights to model: 40.4s, apply dtype to VAE: 0.8s, load VAE: 2.6s, move model to device: 5.0s, hijack: 0.1s, load textual inversion embeddings: 0.8s).
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Calculating sha256 for D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\models\Stable-diffusion\aresMix_v01.safetensors: 6ecece11bf069e9950746d33ab346826c5352acf047c64a3ab74c8884924adf0
Loading weights [6ecece11bf] from D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\models\Stable-diffusion\aresMix_v01.safetensors
Creating model from config: D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying cross attention optimization (InvokeAI).
Model loaded in 42.4s (create model: 1.7s, apply weights to model: 40.2s, load textual inversion embeddings: 0.2s).
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [02:48<00:00,  8.41s/it]
Error completing request███████████████████████████████████████████████████████████████| 20/20 [02:05<00:00,  6.39s/it]
Arguments: ('task(c6cyhnv8oj55v19)', 'photo of a 22 years old Japanese girl, detailed facial features, beautiful detailed face, perfect face, dreamy face expression, high detailed skin, white skin texture, detailed eyes, seductive eyes, alluring eyes, beautiful eyes, full red lips, hourglass body, perfect body, skinny, petite, red pussy, showing pussy, nude, small breast, sitting, hijab, hijab, elegant, sexually suggestive, sex appeal, seductive look, bedroom, submissive, fantasy environment, magical atmosphere, dramatic style, golden hour, embers swirling, soft lighting, volumetric lighting, realistic lighting, cinematic lighting, natural lighting, long exposure trails, hyper detailed, sharp focus, bokeh, masterpiece, award winning photograph, epic character composition,Key light, backlight, soft natural lighting, photography 800 ISO film grain 50mm lens RAW aperture f1.6, highly detailed, Girl, full body, full body view, full body shoot, full body photograph', '(asian:1.2), black and white, sepia, bad art, b&w, canvas frame, cartoon, 3d, Photoshop, video game, 3d render, semi-realistic, cgi, render, sketch, drawing, anime, worst quality, low quality, jpeg artifacts, duplicate, messy drawing, black-white, doll, illustration, lowres, deformed, disfigured, mutation, amputation, distorted, mutated, mutilated, poorly drawn, bad anatomy, wrong anatomy, bad proportions, gross proportions, double body, long body, unnatural body, extra limb, missing limb, floating limb, disconnected limbs, malformed limbs, missing arms, extra arms, disappearing arms, missing legs, extra legs, broken legs, disappearing legs, deformed thighs, malformed hands, mutated hands and fingers, double hands, extra fingers, poorly drawn hands, mutated hands, fused fingers, too many fingers, poorly drawn feet, poorly drawn hands, big hands, hand with more than 5 fingers, hand with less than 5 fingers, bad feet, poorly drawn feet, fused feet, missing feet, bad knee, extra knee, more than 2 legs, poorly drawn face, cloned face, double face, bad hairs, poorly drawn hairs, fused hairs, cross-eye, ugly eyes, bad eyes, poorly drawn eyes, asymmetric eyes, cross-eyed, ugly mouth, missing teeth, crooked teeth, bad mouth, poorly drawn mouth, dirty teeth, bad tongue, fused ears, bad ears, poorly drawn ears, extra ears, heavy ears, missing ears, poorly drawn breasts, more than 2 nipples, missing nipples, different nipples, fused nipples, bad nipples, poorly drawn nipples, bad asshole, poorly drawn asshole, fused asshole, bad anus, bad pussy, bad crotch, fused anus, fused pussy, poorly drawn crotch, poorly drawn anus, poorly drawn pussy, bad clit, fused clit, fused pantie, poorly drawn pantie, fused cloth, poorly drawn cloth, bad pantie, obese, ugly, disgusting, morbid, big muscles, blurry, censored, oversaturated, watermark, watermarked, extra digit, fewer digits, signature, text', [], 20, 15, False, False, 1, 1, 6, -1.0, -1.0, 0, 0, 0, False, 720, 512, False, 0.7, 2, 'Latent', 0, 0, 0, [], 0, False, False, 'positive', 'comma', 0, False, False, '', 1, '', 0, '', 0, '', True, False, False, False, 0) {}
Traceback (most recent call last):
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\modules\call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\modules\txt2img.py", line 56, in txt2img
    processed = process_images(p)
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\modules\processing.py", line 486, in process_images
    res = process_images_inner(p)
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\modules\processing.py", line 634, in process_images_inner
    x_samples_ddim = [decode_first_stage(p.sd_model, samples_ddim[i:i+1].to(dtype=devices.dtype_vae))[0].cpu() for i in range(samples_ddim.size(0))]
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\modules\processing.py", line 634, in <listcomp>
    x_samples_ddim = [decode_first_stage(p.sd_model, samples_ddim[i:i+1].to(dtype=devices.dtype_vae))[0].cpu() for i in range(samples_ddim.size(0))]
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\modules\processing.py", line 423, in decode_first_stage
    x = model.decode_first_stage(x)
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\modules\sd_hijack_utils.py", line 17, in <lambda>
    setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\modules\sd_hijack_utils.py", line 28, in __call__
    return self.__orig_func(*args, **kwargs)
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 826, in decode_first_stage
    return self.first_stage_model.decode(z)
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\modules\lowvram.py", line 52, in first_stage_model_decode_wrap
    return first_stage_model_decode(z)
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\models\autoencoder.py", line 90, in decode
    dec = self.decoder(z)
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 637, in forward
    h = self.up[i_level].block[i_block](h, temb)
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 132, in forward
    h = nonlinearity(h)
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\functional.py", line 2059, in silu
    return torch._C._nn.silu(input)
RuntimeError: Could not allocate tensor with 377487360 bytes. There is not enough GPU video memory available!

Additional information

RX 570 4GB Ryzen 5 3500 RAM 8GB single channel Driver AMD Software PRO Edition DirectX 12

ethan0228 commented 1 year ago

me 2 I have same error...

chenshaoju commented 1 year ago

try add --precision full in COMMANDLINE_ARGS=

here is my example(5500XT): set COMMANDLINE_ARGS=--listen --medvram --precision full --opt-split-attention-v1 --no-half --no-half-vae --opt-sub-quad-attention --disable-nan-check --use-cpu interrogate gfpgan bsrgan esrgan scunet codeformer

Miraihi commented 1 year ago

First - the arguments. Second - not sure what's the maximum resolution your GPU is capable of. I can generate a maximum of 600x800 on my RX 580 (8Gb) with arguments --medvram --precision full --no-half --no-half-vae --opt-split-attention-v1 --opt-sub-quad-attention --disable-nan-check.

lylerolleman commented 1 year ago

The above arguments got it working for me when doing single pictures. Batches still fail but it at least works (even on a 5800X, running this on CPU was painful...)

Tried with --lowvram with same results. Running an RX 580 8GB

Miraihi commented 1 year ago

Tried with --lowvram with same results. Running an RX 580 8GB

--lowvram makes your GPU heavily limit its utilization (50-60%), so --medvram is a way to go. (Still have to check the lowvram box for ControlNet though).

SunGreen777 commented 1 year ago

Thank you, RX 570 (8) ok

TheWingAg commented 1 year ago

First - the arguments. Second - not sure what's the maximum resolution your GPU is capable of. I can generate a maximum of 600x800 on my RX 580 (8Gb) with arguments --medvram --precision full --no-half --no-half-vae --opt-split-attention-v1 --opt-sub-quad-attention --disable-nan-check.

in my case, my computer: R2700 + RX6800 + 16GB ram, windown 10. i can generated image as 512 x 512 normaly but size image bigger, i can't generate. error: ....There is not enough GPU video memory available. hmm, rx6800 has 16G Vram.

Miraihi commented 1 year ago

Some cards have their own quirks, search the mentions of your card in the discussions. The latest collection of arguments the community have come with is set COMMANDLINE_ARGS=--medvram --precision full --no-half --no-half-vae --opt-split-attention-invokeai --always-batch-cond-uncond --opt-sub-quad-attention --sub-quad-q-chunk-size 512 --sub-quad-kv-chunk-size 512 --sub-quad-chunk-threshold 80 --disable-nan-check --upcast-sampling set SAFETENSORS_FAST_GPU=1

justanothernguyen commented 1 year ago

It is unfortunately because of the memory inefficiency of DirectML (what made this repo possible in the first place). Not able to use xformers also hurts performance and VRAM usage too.

What's weird is that when I run with 6900XT I noticed the "shared GPU memory" was being used (only for about 2GB, but still). This is not the case when I run regular A1111 webui with a 3060.

May be you can generate 512x512 and try to upscale in img2img using SD upscale (In the Script section at the bottom of img2img tab).

TheWingAg commented 1 year ago

It is unfortunately because of the memory inefficiency of DirectML (what made this repo possible in the first place). Not able to use xformers also hurts performance and VRAM usage too.

What's weird is that when I run with 6900XT I noticed the "shared GPU memory" was being used (only for about 2GB, but still). This is not the case when I run regular A1111 webui with a 3060.

May be you can generate 512x512 and try to upscale in img2img using SD upscale (In the Script section at the bottom of img2img tab).

thanks. me to. i use rx6800 - 16GB . i think that shared ram is avaiable because i see on tab manganer. shared ram is dont used. max size image is about 420.000 with width x height. Do u think so?

tornado73 commented 1 year ago

my 6800 ,win 11 pro, 22H2 Adrenalin Edition 23.4.1

1,
it is important for me - folder SD is in the root of drive C 2 .Open CMD in the root of the directory stable-diffusion-webui-directml.

git pull to ensure latest update pip install -r requirements.txt

<- it was at this point I knew I effed up during initial setup because I saw several missing items getting installed. 3 For the webui-user.bat file, I added the following line set COMMANDLINE_ARGS=--medvram --precision full --no-half --no-half-vae --opt-sub-quad-attention --opt-split-attention --opt-split-attention-v1 --disable-nan-check --autolaunch


result 1024*1024

euler a ---- MAX 26/26 [01:16<00:00, 2.96s/it] dpm++2m karras -----MAX 26/26 [02:19<00:18, 6.05s/it]

with my trained model .cpkl

3

model deliberate_v2 .safetensors 1024x1280 DPM++2m Karras ----- max 26/26 [01:50<00:00, 4.24s/it]

I usually generate 440 * 640, 4 pictures each and then the necessary upscale from Topaz Photo AI

Good luck

p.s. 1280*1280 RuntimeError: Could not allocate tensor with 377487360 bytes. There is not enough GPU video memory available! -)))

thegr1mmer commented 1 year ago

First - the arguments. Second - not sure what's the maximum resolution your GPU is capable of. I can generate a maximum of 600x800 on my RX 580 (8Gb) with arguments --medvram --precision full --no-half --no-half-vae --opt-split-attention-v1 --opt-sub-quad-attention --disable-nan-check.

in my case, my computer: R2700 + RX6800 + 16GB ram, windown 10. i can generated image as 512 x 512 normaly but size image bigger, i can't generate. error: ....There is not enough GPU video memory available. hmm, rx6800 has 16G Vram.

same exactly for me

justanothernguyen commented 1 year ago

Guys, have you tried the extension for tiled VAE? It should dramatically reduce VRAM usage

https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Extensions#multidiffusion-with-tiled-vae

Neoony commented 1 year ago

Also having this problem It definitely got reduced by --opt-split-attention-v1 --opt-sub-quad-attention However sometimes rarely it still crashes with this error

Running --medvram --precision full --no-half --no-half-vae --opt-split-attention-v1 --opt-sub-quad-attention --disable-nan-check or --precision full --no-half --no-half-vae --opt-split-attention-v1 --opt-sub-quad-attention --disable-nan-check

Radeon 7900 XTX Nitro+ (24GB VRAM)

If I set very high resolution (e.g. above 1280) I will likely crash If I go lower res (e.g. 768x1024), I can generate images fine...however doing bigger batch count, or generating some animation (e.g. deforum) it will eventually crash. I can generate 500/hundreds of images fine...but then at some point it will crash with not enough memory I have been messing with various settings, but no luck getting rid of it. Will be checking if the tiled VAE is useful for this meanwhile :(

Miraihi commented 1 year ago

Also having this problem It definitely got reduced by --opt-split-attention-v1 --opt-sub-quad-attention However sometimes rarely it still crashes with this error

Running --medvram --precision full --no-half --no-half-vae --opt-split-attention-v1 --opt-sub-quad-attention --disable-nan-check or --precision full --no-half --no-half-vae --opt-split-attention-v1 --opt-sub-quad-attention --disable-nan-check

Recently I've revealed another combination of the arguments that (seemingly) allowed me to run the basic f-16 canny controlnet model without lowvram flag when I couldn't do it before.

So, here it is: set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:128 set COMMANDLINE_ARGS=--medvram --always-batch-cond-uncond --precision full --no-half --opt-split-attention --opt-sub-quad-attention --sub-quad-q-chunk-size 512 --sub-quad-kv-chunk-size 512 --sub-quad-chunk-threshold 80 --disable-nan-check --use-cpu interrogate gfpgan codeformer --upcast-sampling --autolaunch --api set SAFETENSORS_FAST_GPU=1

I'm not entirely sure if PYTORCH_CUDA_ALLOC_CONF actually work, maybe it's a placebo. Requires more testing. But the log doesn't complain.

Also, if you want to generate really big images, use this (works for me) or that (doesn't work for me, but seems like objectively a better option) extension. In general most modern models are trained at 768x768 and don't handle well anything higher than 1024 pixels.

Chocobollitoo commented 1 year ago

hi. I just tried to generate some images and when the IA its close to end generating, this error jumps and i can't see any images that i could have generated, just the same error as this issue title. any idea?

lshqqytiger commented 1 year ago

That error is same as OOM (Out of Memory). The resolution or batch size of image you tried to generate may be too large. (DirectML does not support cleaning useless memory yet)

Chocobollitoo commented 1 year ago

512x512 is too large for generating? i didnt know that

lshqqytiger commented 1 year ago

It depends on available size of vram your gpu has. Add --opt-sub-quad-attention or --medvram or both.

Chocobollitoo commented 1 year ago

added both and nothing 🤷, my gpu is a rx 6600

lshqqytiger commented 1 year ago

My RX 5700 XT can generate 512x768 with hires fix x1.5 when I turned off everything without webui and necessary processes. I uses --no-half --precision full --opt-sub-quad-attention.

Neoony commented 1 year ago

Try these --precision full --no-half --no-half-vae --opt-split-attention-v1 --opt-sub-quad-attention --disable-nan-check

That at least helped me to generate bigger res or have it stay without error for much longer (but it might still crash) Mainly these 2 --opt-split-attention-v1 enable older version of split attention optimization that does not consume all the VRAM it can find

--opt-sub-quad-attention enable memory efficient sub-quadratic cross-attention layer optimization

strykenyne commented 1 year ago

Some cards have their own quirks, search the mentions of your card in the discussions. The latest collection of arguments the community have come with is set COMMANDLINE_ARGS=--medvram --precision full --no-half --no-half-vae --opt-split-attention-invokeai --always-batch-cond-uncond --opt-sub-quad-attention --sub-quad-q-chunk-size 512 --sub-quad-kv-chunk-size 512 --sub-quad-chunk-threshold 80 --disable-nan-check --upcast-sampling set SAFETENSORS_FAST_GPU=1

Thank you so much for sharing this. I used to be only able to do 512 x 512 images at 20 steps max before I would get out of VRAM. Now I'm doing 1024 x 768 at 50 steps... 1024 x 1024 still puts me out of VRAM though, but hey, it's a major improvement! :D

Nathan-dm commented 1 year ago

i tried every argument in this issue, none of them work with any resolution image or any sampler, tried to change --medvramwith --lowvram, some argument work but maximum able to generate is 592x600 with --lowvram argument added. my system spec: Acer Swift 3X I5 1135G7 16 GB Ram Intel xe graphic 80eu (shared memory) intel xe max (4gb vram)

Miraihi commented 1 year ago

but maximum able to generate is 592x600 with --lowvram argument added.

The only workaround to reach higher resolutions for now is using img2img Ultimate upscaler script.

tornado73 commented 1 year ago

install a second system ubuntu

ubunta 2004 + 6800 Total progress: 100%|███████████████████████████| 26/26 [00:03<00:00, 7.46it/s] 512512 Total progress: 100%|███████████████████████████| 20/20 [00:05<00:00, 3.79it/s] 640640 Total progress: 100%|███████████████████████████| 20/20 [00:08<00:00, 2.37it/s] 768768 Total progress: 100%|███████████████████████████| 20/20 [00:26<00:00, 1.33s/it] 10241024 Total progress: 100%|███████████████████████████| 20/20 [01:58<00:00, 5.93s/it] 1280*1280

when generating 768 * 768, memory is spent 6.7 GB from 16 ,together with the system and two browsers that consume 3gb in idle time

Screenshot from 2023-04-20 00-06-27

one thing but ,you have to put it manually, auto-installation from the topic does not work other instructions are outdated

this is my compilation from different sources, tested on my 6800 :-) it seems that it is complicated and cumbersome, but it is only an hour of your time you sit more waiting for the generation :-)

install ubuntu 20.04 start terminal and let's go

sudo apt update sudo apt install wget gnupg2 gawk curl sudo apt install libnuma-dev libncurses5 sudo reboot sudo usermod -a -G video <username> sudo usermod -a -G render <username> sudo apt update wget https://repo.radeon.com/amdgpu-install/5.4.2/ubuntu/focal/amdgpu-install_5.4.50402-1_all.deb sudo apt-get install ./amdgpu-install_5.4.50402-1_all.deb sudo amdgpu-install --usecase=rocm,hip,mllib --no-dkms

you can put 5.4.3 but everything suits me

rocminfo

name: gfx1030 --ok

sudo reboot sudo apt-get install python3 alias python=python3 nano ~/.bashrc

add

alias python=python3 export HSA_OVERRIDE_GFX_VERSION=10.3.0

28021120

save x - y - enter

sudo apt install python3-venv sudo reboot sudo apt-get install git git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui cd stable-diffusion-webui python -m venv venv sudo apt install python3-pip python -m pip install --upgrade pip wheel pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2 pip list

if pytorch 2.0.0+5.4.2---ok

python launch.py

if the cards are not the 6000th line

try python launch.py --precision full –no-half

or

python launch.py --precision full --no-half –medvram

add in webui-user.sh

export HSA_OVERRIDE_GFX_VERSION=10.3.0 python -m venv venv source venv/bin/activate

Screenshot+from+2023-04-16+12-59-32

save

launch - double click webui.sh - run in terminal

increase in generation rate in 0.33 it/s -win vs 6.8-7 it/s ubuntu 512*512 it's worth it -) and no memory leaks and crashes -)

Good luck!

hellozhaoming commented 1 year ago

the sd webui in win10 will work after I shutdown wsl, even if sd didn't free the memory. wsl and directml in win may not work together.

justanothernguyen commented 1 year ago

All of you gays think this is beacuse the GPU memory is too small or the image size is too large.

Maybe read the discussion again...?

We know the issue is from DirectML not releasing memory, so by cutting down memory usage in the first place, DirectML also hog less memory. Think step 1 taking 1GB of VRAM instead of 1.5GB, etc... you will be able to go 12 steps instead of 8.

Also optimizing memory is the only actionable the mass can do. Or are you suggesting everyone to go fixing DirectML instead?

Miraihi commented 1 year ago

@tornado73 Holy hell, I did not realize the performance is that better! Maybe I should once again begin to dual boot.

FrakerKill commented 1 year ago

I have problems allocating 512x512 images in windows 10 and RX6600:

image

chenshaoju commented 1 year ago

can you use Taskmgr to monitor your GPU memory status?

image

FrakerKill commented 1 year ago

Like that? Just launch one 512x512 ++SDE Karras 50steps with hires fix to 1.5 20steps. Now, seems more stable but when I go into 1.6, good luck. I have to recreate the venv with these arguments:

set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:128 set COMMANDLINE_ARGS=--medvram --always-batch-cond-uncond --precision full --no-half --opt-split-attention --opt-sub-quad-attention --sub-quad-q-chunk-size 512 --sub-quad-kv-chunk-size 512 --sub-quad-chunk-threshold 80 --disable-nan-check --use-cpu interrogate gfpgan codeformer --upcast-sampling --api --listen --autolaunch set SAFETENSORS_FAST_GPU=1

Screenshot_2023-05-05-20-27-51-553_com teamviewer teamviewer market mobile

Neoony commented 1 year ago

What happened? I didnt use Stable diffusion for few months Now I clean everything up (venv, repository) and pull, hoping the memory stuff maybe got improved, and run my old bat and now I get out of memory errors in so many more situations.

Before I could easily do hires fix from 512 to 1280 without much issues (only after many many batches it would error) But now I just get crashes even on 1024 Even just regular img2img 512 to 1024, it just crashes all the time I also crashed at least once just generating 512

Something has gotten worse (7900 XTX Nitro+)

Guess I will have to mess with these again set COMMANDLINE_ARGS=--precision full --no-half --no-half-vae --opt-split-attention-v1 --opt-sub-quad-attention --disable-nan-check Worked fine for me sometime April

Miraihi commented 1 year ago

@Neoony AUTOMATIC11111 1.3 web ui merge happened. Now there's new option called "Optimizations" where you can choose--opt variations without adding them into webui-user.bat (If you're wondering - yes, only one --opt argument can be used. You can't choose both --opt-split-attention-v1 and --opt-sub-quad-attention). Also, well, token merging happened. You can use my settings. These ain't too bad, the performance boost is significant (You can't use sub-quad-opt with token merging though. You'll be getting black images left and right.). But 1.3 broke the memory management even more, to the point when I can't even use any ControlNet model. So I run vladmandic-directml version in parallel when I have to deal with controlnet. Also, live preview now works properly, and there's the brand new, performance-efficient and good-looking method.

Neoony commented 1 year ago

I see. Even just applying original v1 is already improvement and I can generate something that crashed. (I have removed all arguments from the bat file)

Thanks a lot for the info, tips and your settings, I will have to mess with these.

Miraihi commented 1 year ago

@Neoony Just in case, the arguments I've left in my webui-user.bat are: --medvram --precision full --no-half --no-half-vae --backend directml --disable-nan-check. medvram is still highly usable.

waldolin commented 1 year ago

--backend directml --disable-nan-check Error running install.py for extension C:\Users\lin\stable-diffusion-webui-directml\extensions\sd_dreambooth_extension. Command: "C:\Users\lin\stable-diffusion-webui-directml\venv\Scripts\python.exe" "C:\Users\lin\stable-diffusion-webui-directml\extensions\sd_dreambooth_extension\install.py" Error code: 1 stdout: No module 'xformers'. Proceeding without it. Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled If submitting an issue on github, please provide the full startup log for debugging purposes.

Initializing Dreambooth Dreambooth revision: dc413a14379b165355502d9f65856c40a4bb5b6f

stderr: C:\Users\lin\stable-diffusion-webui-directml\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: pytorch_lightning.utilities.distributed.rank_zero_only has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from pytorch_lightning.utilities instead. rank_zero_deprecation( C:\Users\lin\stable-diffusion-webui-directml\venv\lib\site-packages\torchvision\transforms\functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be removed in 0.17. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional. warnings.warn( Traceback (most recent call last): File "C:\Users\lin\stable-diffusion-webui-directml\extensions\sd_dreambooth_extension\postinstall.py", line 75, in install_requirements pip_install("-r", req_file) File "C:\Users\lin\stable-diffusion-webui-directml\extensions\sd_dreambooth_extension\postinstall.py", line 53, in pip_install output = subprocess.check_output( File "C:\Users\lin\stable-diffusion-webui\python\lib\subprocess.py", line 420, in check_output return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, File "C:\Users\lin\stable-diffusion-webui\python\lib\subprocess.py", line 524, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['C:\Users\lin\stable-diffusion-webui-directml\venv\Scripts\python.exe', '-m', 'pip', 'install', '-r', 'C:\Users\lin\stable-diffusion-webui-directml\extensions\sd_dreambooth_extension\requirements.txt']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\lin\stable-diffusion-webui-directml\extensions\sd_dreambooth_extension\install.py", line 35, in actual_install() File "C:\Users\lin\stable-diffusion-webui-directml\extensions\sd_dreambooth_extension\postinstall.py", line 41, in actual_install install_requirements() File "C:\Users\lin\stable-diffusion-webui-directml\extensions\sd_dreambooth_extension\postinstall.py", line 85, in install_requirements error_msg = grepexc.stdout.decode() UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa6 in position 15728: invalid start byte

Launching Web UI with arguments: --lowvram --precision full --no-half --no-half-vae --opt-sub-quad-attention --enable-insecure-extension-access --deepdanbooru --backend directml --disable-nan-check C:\Users\lin\stable-diffusion-webui-directml\venv\lib\site-packages\pkg_resources__init__.py:123: PkgResourcesDeprecationWarning: llow is an invalid version and will not be supported in a future release warnings.warn( No module 'xformers'. Proceeding without it. usage: launch.py [-h] [--update-all-extensions] [--skip-python-version-check] [--skip-torch-cuda-test] [--reinstall-xformers] [--reinstall-torch] [--update-check] [--tests TESTS] [--no-tests] [--skip-install] [--data-dir DATA_DIR] [--config CONFIG] [--ckpt CKPT] [--ckpt-dir CKPT_DIR] [--vae-dir VAE_DIR] [--gfpgan-dir GFPGAN_DIR] [--gfpgan-model GFPGAN_MODEL] [--no-half] [--no-half-vae] [--no-progressbar-hiding] [--max-batch-count MAX_BATCH_COUNT] [--embeddings-dir EMBEDDINGS_DIR] [--textual-inversion-templates-dir TEXTUAL_INVERSION_TEMPLATES_DIR] [--hypernetwork-dir HYPERNETWORK_DIR] [--localizations-dir LOCALIZATIONS_DIR] [--allow-code] [--medvram] [--lowvram] [--lowram] [--always-batch-cond-uncond] [--unload-gfpgan] [--precision {full,autocast}] [--upcast-sampling] [--share] [--ngrok NGROK] [--ngrok-region NGROK_REGION] [--enable-insecure-extension-access] [--codeformer-models-path CODEFORMER_MODELS_PATH] [--gfpgan-models-path GFPGAN_MODELS_PATH] [--esrgan-models-path ESRGAN_MODELS_PATH] [--bsrgan-models-path BSRGAN_MODELS_PATH] [--realesrgan-models-path REALESRGAN_MODELS_PATH] [--clip-models-path CLIP_MODELS_PATH] [--xformers] [--force-enable-xformers] [--xformers-flash-attention] [--deepdanbooru] [--opt-split-attention] [--opt-sub-quad-attention] [--sub-quad-q-chunk-size SUB_QUAD_Q_CHUNK_SIZE] [--sub-quad-kv-chunk-size SUB_QUAD_KV_CHUNK_SIZE] [--sub-quad-chunk-threshold SUB_QUAD_CHUNK_THRESHOLD] [--opt-split-attention-invokeai] [--opt-split-attention-v1] [--opt-sdp-attention] [--opt-sdp-no-mem-attention] [--disable-opt-split-attention] [--disable-nan-check] [--use-cpu USE_CPU [USE_CPU ...]] [--listen] [--port PORT] [--show-negative-prompt] [--ui-config-file UI_CONFIG_FILE] [--hide-ui-dir-config] [--freeze-settings] [--ui-settings-file UI_SETTINGS_FILE] [--gradio-debug] [--gradio-auth GRADIO_AUTH] [--gradio-auth-path GRADIO_AUTH_PATH] [--gradio-img2img-tool GRADIO_IMG2IMG_TOOL] [--gradio-inpaint-tool GRADIO_INPAINT_TOOL] [--opt-channelslast] [--styles-file STYLES_FILE] [--autolaunch] [--theme THEME] [--use-textbox-seed] [--disable-console-progressbars] [--enable-console-prompts] [--vae-path VAE_PATH] [--disable-safe-unpickle] [--api] [--api-auth API_AUTH] [--api-log] [--nowebui] [--ui-debug-mode] [--device-id DEVICE_ID] [--administrator] [--cors-allow-origins CORS_ALLOW_ORIGINS] [--cors-allow-origins-regex CORS_ALLOW_ORIGINS_REGEX] [--tls-keyfile TLS_KEYFILE] [--tls-certfile TLS_CERTFILE] [--disable-tls-verify] [--server-name SERVER_NAME] [--gradio-queue] [--no-gradio-queue] [--skip-version-check] [--no-hashing] [--no-download-sd-model] [--subpath SUBPATH] [--addnet-max-model-count ADDNET_MAX_MODEL_COUNT] [--controlnet-dir CONTROLNET_DIR] [--controlnet-annotator-models-path CONTROLNET_ANNOTATOR_MODELS_PATH] [--no-half-controlnet] [--controlnet-preprocessor-cache-size CONTROLNET_PREPROCESSOR_CACHE_SIZE] [--controlnet-loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [--dreambooth-models-path DREAMBOOTH_MODELS_PATH] [--lora-models-path LORA_MODELS_PATH] [--ckptfix] [--force-cpu] [--profile-db] [--debug-db] [--deepdanbooru-projects-path DEEPDANBOORU_PROJECTS_PATH] [--ldsr-models-path LDSR_MODELS_PATH] [--lora-dir LORA_DIR] [--scunet-models-path SCUNET_MODELS_PATH] [--swinir-models-path SWINIR_MODELS_PATH] launch.py: error: unrecognized arguments: --backend directml

how to solve it? i have installed directml. torch-directml 0.2.0.dev230426

lshqqytiger commented 1 year ago

You are not on latest commit. git pull and try again.

waldolin commented 1 year ago

@Neoony Just in case, the arguments I've left in my webui-user.bat are:以防万一,我留下的论点 webui-user.bat 是: --medvram --precision full --no-half --no-half-vae --backend directml --disable-nan-check. medvram is still highly usable. --medvram --precision full --no-half --no-half-vae --backend directml --disable-nan-check .Medvram仍然高度可用。

when i git pull, and run the webui-user.bat there's a problem venv "C:\Users\lin\stable-diffusion-webui-directml\venv\Scripts\Python.exe" fatal: No names found, cannot describe anything. Python 3.10.7 (tags/v3.10.7:6cc6b13, Sep 5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)] Version: Commit hash: 06296ff0011523f49022dee0e4b0476a8497c473 Fetching updates for K-diffusion... Checking out commit for K-diffusion with hash: c9fe758757e022f05ca5a53fa8fac28889e4f1cf... Traceback (most recent call last): File "C:\Users\lin\stable-diffusion-webui-directml\launch.py", line 38, in main() File "C:\Users\lin\stable-diffusion-webui-directml\launch.py", line 29, in main prepare_environment() File "C:\Users\lin\stable-diffusion-webui-directml\modules\launch_utils.py", line 304, in prepare_environment git_clone(k_diffusion_repo, repo_dir('k-diffusion'), "K-diffusion", k_diffusion_commit_hash) File "C:\Users\lin\stable-diffusion-webui-directml\modules\launch_utils.py", line 145, in git_clone run(f'"{git}" -C "{dir}" checkout {commithash}', f"Checking out commit for {name} with hash: {commithash}...", f"Couldn't checkout commit {commithash} for {name}") File "C:\Users\lin\stable-diffusion-webui-directml\modules\launch_utils.py", line 102, in run raise RuntimeError("\n".join(error_bits)) RuntimeError: Couldn't checkout commit c9fe758757e022f05ca5a53fa8fac28889e4f1cf for K-diffusion. Command: "C:\Users\lin\stable-diffusion-webui\git\cmd/git.exe" -C "C:\Users\lin\stable-diffusion-webui-directml\repositories\k-diffusion" checkout c9fe758757e022f05ca5a53fa8fac28889e4f1cf Error code: 128 stderr: fatal: reference is not a tree: c9fe758757e022f05ca5a53fa8fac28889e4f1cf

how to solve it???

there is the information of update

remote: Enumerating objects: 3217, done. remote: Counting objects: 100% (1254/1254), done. remote: Total 3217 (delta 1254), reused 1254 (delta 1254), pack-reused 1963Receiving objects: 100% (3217/3217), 1.29 MiBReceiving objects: 100% (3217/3217), 1.54 MiB | 1.35 MiB/s, done.

Resolving deltas: 100% (2237/2237), completed with 261 local objects. From https://github.com/lshqqytiger/stable-diffusion-webui-directml 108ada83..06296ff0 master -> origin/master

Neoony commented 1 year ago

try to delete the venv folder and then run the bat again I guess it might be good to do that for any update

or maybe also the repositories folder, but not sure if that's needed

waldolin commented 1 year ago

Error code: 128 错误代码:128 stderr: fatal: reference is not a tree: c9fe758757e022f05ca5a53fa8fac28889e4f1cf

i delete the venv, run the bat again but it is the same

venv "C:\Users\lin\stable-diffusion-webui-directml\venv\Scripts\Python.exe" fatal: No names found, cannot describe anything. Python 3.10.7 (tags/v3.10.7:6cc6b13, Sep 5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)] Version: Commit hash: 06296ff0011523f49022dee0e4b0476a8497c473 Fetching updates for K-diffusion... Checking out commit for K-diffusion with hash: c9fe758757e022f05ca5a53fa8fac28889e4f1cf... Traceback (most recent call last): File "C:\Users\lin\stable-diffusion-webui-directml\launch.py", line 38, in main() File "C:\Users\lin\stable-diffusion-webui-directml\launch.py", line 29, in main prepare_environment() File "C:\Users\lin\stable-diffusion-webui-directml\modules\launch_utils.py", line 304, in prepare_environment git_clone(k_diffusion_repo, repo_dir('k-diffusion'), "K-diffusion", k_diffusion_commit_hash) File "C:\Users\lin\stable-diffusion-webui-directml\modules\launch_utils.py", line 145, in git_clone run(f'"{git}" -C "{dir}" checkout {commithash}', f"Checking out commit for {name} with hash: {commithash}...", f"Couldn't checkout commit {commithash} for {name}") File "C:\Users\lin\stable-diffusion-webui-directml\modules\launch_utils.py", line 102, in run raise RuntimeError("\n".join(error_bits)) RuntimeError: Couldn't checkout commit c9fe758757e022f05ca5a53fa8fac28889e4f1cf for K-diffusion. Command: "C:\Users\lin\stable-diffusion-webui\git\cmd/git.exe" -C "C:\Users\lin\stable-diffusion-webui-directml\repositories\k-diffusion" checkout c9fe758757e022f05ca5a53fa8fac28889e4f1cf Error code: 128 stderr: fatal: reference is not a tree: c9fe758757e022f05ca5a53fa8fac28889e4f1cf

the massage shows Creating venv in directory C:\Users\lin\stable-diffusion-webui-directml\venv using python "C:\Users\lin\stable-diffusion-webui\python\python.exe" venv "C:\Users\lin\stable-diffusion-webui-directml\venv\Scripts\Python.exe" fatal: No names found, cannot describe anything. Python 3.10.7 (tags/v3.10.7:6cc6b13, Sep 5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)] Version: Commit hash: 06296ff0011523f49022dee0e4b0476a8497c473 Installing torch and torchvision Collecting torch==2.0.0 Using cached torch-2.0.0-cp310-cp310-win_amd64.whl (172.3 MB) Collecting torchvision==0.15.1 Using cached torchvision-0.15.1-cp310-cp310-win_amd64.whl (1.2 MB) Collecting torch-directml Using cached torch_directml-0.2.0.dev230426-cp310-cp310-win_amd64.whl (8.2 MB) Collecting sympy Using cached sympy-1.12-py3-none-any.whl (5.7 MB) Collecting networkx Using cached networkx-3.1-py3-none-any.whl (2.1 MB) Collecting jinja2 Using cached Jinja2-3.1.2-py3-none-any.whl (133 kB) Collecting filelock Downloading filelock-3.12.2-py3-none-any.whl (10 kB) Collecting typing-extensions Using cached typing_extensions-4.6.3-py3-none-any.whl (31 kB) Collecting numpy Using cached numpy-1.25.0-cp310-cp310-win_amd64.whl (15.0 MB) Collecting requests Using cached requests-2.31.0-py3-none-any.whl (62 kB) Collecting pillow!=8.3.*,>=5.3.0 Using cached Pillow-9.5.0-cp310-cp310-win_amd64.whl (2.5 MB) Collecting MarkupSafe>=2.0 Using cached MarkupSafe-2.1.3-cp310-cp310-win_amd64.whl (17 kB) Collecting certifi>=2017.4.17 Using cached certifi-2023.5.7-py3-none-any.whl (156 kB) Collecting charset-normalizer<4,>=2 Using cached charset_normalizer-3.1.0-cp310-cp310-win_amd64.whl (97 kB) Collecting urllib3<3,>=1.21.1 Using cached urllib3-2.0.3-py3-none-any.whl (123 kB) Collecting idna<4,>=2.5 Using cached idna-3.4-py3-none-any.whl (61 kB) Collecting mpmath>=0.19 Using cached mpmath-1.3.0-py3-none-any.whl (536 kB) Installing collected packages: mpmath, urllib3, typing-extensions, sympy, pillow, numpy, networkx, MarkupSafe, idna, filelock, charset-normalizer, certifi, requests, jinja2, torch, torchvision, torch-directml Successfully installed MarkupSafe-2.1.3 certifi-2023.5.7 charset-normalizer-3.1.0 filelock-3.12.2 idna-3.4 jinja2-3.1.2 mpmath-1.3.0 networkx-3.1 numpy-1.25.0 pillow-9.5.0 requests-2.31.0 sympy-1.12 torch-2.0.0 torch-directml-0.2.0.dev230426 torchvision-0.15.1 typing-extensions-4.6.3 urllib3-2.0.3

[notice] A new release of pip available: 22.2.2 -> 23.1.2 [notice] To update, run: C:\Users\lin\stable-diffusion-webui-directml\venv\Scripts\python.exe -m pip install --upgrade pip Installing gfpgan Installing clip Installing open_clip Fetching updates for K-diffusion... Checking out commit for K-diffusion with hash: c9fe758757e022f05ca5a53fa8fac28889e4f1cf... Traceback (most recent call last): File "C:\Users\lin\stable-diffusion-webui-directml\launch.py", line 38, in main() File "C:\Users\lin\stable-diffusion-webui-directml\launch.py", line 29, in main prepare_environment() File "C:\Users\lin\stable-diffusion-webui-directml\modules\launch_utils.py", line 304, in prepare_environment git_clone(k_diffusion_repo, repo_dir('k-diffusion'), "K-diffusion", k_diffusion_commit_hash) File "C:\Users\lin\stable-diffusion-webui-directml\modules\launch_utils.py", line 145, in git_clone run(f'"{git}" -C "{dir}" checkout {commithash}', f"Checking out commit for {name} with hash: {commithash}...", f"Couldn't checkout commit {commithash} for {name}") File "C:\Users\lin\stable-diffusion-webui-directml\modules\launch_utils.py", line 102, in run raise RuntimeError("\n".join(error_bits)) RuntimeError: Couldn't checkout commit c9fe758757e022f05ca5a53fa8fac28889e4f1cf for K-diffusion. Command: "C:\Users\lin\stable-diffusion-webui\git\cmd/git.exe" -C "C:\Users\lin\stable-diffusion-webui-directml\repositories\k-diffusion" checkout c9fe758757e022f05ca5a53fa8fac28889e4f1cf Error code: 128 stderr: fatal: reference is not a tree: c9fe758757e022f05ca5a53fa8fac28889e4f1cf

Neoony commented 1 year ago

and try to delete repository folder?

waldolin commented 1 year ago

and try to delete repository folder?并尝试删除存储库文件夹? I Delete repository it works, thank you but I have a new problem about restarting my pc when i generate nothing like this happened in the past when generating what can i do?

it happen only one time. it works normally now.


venv "C:\Users\lin\stable-diffusion-webui-directml\venv\Scripts\Python.exe" fatal: No names found, cannot describe anything. Python 3.10.7 (tags/v3.10.7:6cc6b13, Sep 5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)] Version: Commit hash: 06296ff0011523f49022dee0e4b0476a8497c473 Installing requirements

No module 'xformers'. Proceeding without it. Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled If submitting an issue on github, please provide the full startup log for debugging purposes.

Initializing Dreambooth Dreambooth revision: dc413a14379b165355502d9f65856c40a4bb5b6f Successfully installed accelerate-0.19.0 fastapi-0.94.1 gitpython-3.1.31 transformers-4.29.2

Does your project take forever to startup? Repetitive dependency installation may be the reason. Automatic1111's base project sets strict requirements on outdated dependencies. If an extension is using a newer version, the dependency is uninstalled and reinstalled twice every startup.

[!] xformers NOT installed. [+] torch version 2.0.0 installed. [+] torchvision version 0.15.1 installed. [+] accelerate version 0.19.0 installed. [+] diffusers version 0.16.1 installed. [+] transformers version 4.29.2 installed. [+] bitsandbytes version 0.35.4 installed.

Launching Web UI with arguments: --lowvram --precision full --no-half --no-half-vae --opt-sub-quad-attention --enable-insecure-extension-access --deepdanbooru --disable-nan-check --backend directml C:\Users\lin\stable-diffusion-webui-directml\venv\lib\site-packages\pkg_resources__init__.py:123: PkgResourcesDeprecationWarning: llow is an invalid version and will not be supported in a future release warnings.warn( No module 'xformers'. Proceeding without it. Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled [AddNet] Updating model hashes... 0it [00:00, ?it/s] [AddNet] Updating model hashes... 0it [00:00, ?it/s] 2023-06-28 22:26:15,122 - ControlNet - INFO - ControlNet v1.1.227 ControlNet preprocessor location: C:\Users\lin\stable-diffusion-webui-directml\extensions\sd-webui-controlnet\annotator\downloads 2023-06-28 22:26:15,274 - ControlNet - INFO - ControlNet v1.1.227 Loading weights [fc2511737a] from C:\Users\lin\stable-diffusion-webui-directml\models\Stable-diffusion\chilloutmix_NiPrunedFp32Fix.safetensors Creating model from config: C:\Users\lin\stable-diffusion-webui-directml\configs\v1-inference.yaml LatentDiffusion: Running in eps-prediction mode DiffusionWrapper has 859.52 M params. Textual inversion embeddings loaded(0): Model loaded in 3.6s (load weights from disk: 0.3s, create model: 0.6s, apply weights to model: 2.6s, load VAE: 0.1s). Applying optimization: sub-quadratic... done. CUDA SETUP: Loading binary C:\Users\lin\stable-diffusion-webui-directml\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cudaall.dll... Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().

my PC MSI B660M RX-6650XT, 64G RAM

liquiddandruff commented 1 year ago

@Neoony AUTOMATIC11111 1.3 web ui merge happened. Now there's new option called "Optimizations" where you can choose--opt variations without adding them into webui-user.bat (If you're wondering - yes, only one --opt argument can be used. You can't choose both --opt-split-attention-v1 and --opt-sub-quad-attention). Also, well, token merging happened. You can use my settings. These ain't too bad, the performance boost is significant (You can't use sub-quad-opt with token merging though. You'll be getting black images left and right.). But 1.3 broke the memory management even more, to the point when I can't even use any ControlNet model. So I run vladmandic-directml version in parallel when I have to deal with controlnet. Also, live preview now works properly, and there's the brand new, performance-efficient and good-looking method.

Thanks for this info. I was at 3284ccc0, and now after update, I also got the super slow generation of ~7 seconds per iteration.

After copying your optimization settings I am able to return to similar generation speeds as before of ~2 iterations per sec.

However I find like you that all ControlNet attempts now fail with GPU OOM errors (I can't upscale, not even with tiled upscaling etc).

Miraihi commented 1 year ago

However I find like you that all ControlNet attempts now fail with GPU OOM errors (I can't upscale, not even with tiled upscaling etc).

True, Controlnet is unusable in the current commit, no matter the model or resolution. Also the size of the possible image sizes has decreased in general (Can't render anything in 1024p anymore). But at least token merging and Negative Guidance minimum sigma bring a massive speedup. But I use Stable Diffusion primarily for inpainting so the inability to use ControlNet is not that critical. I keep the earlier version just in case I need it.

Booty3ater900 commented 1 year ago

man stable diffusions a bitch

Miraihi commented 1 year ago

I urge you all to try vladmandic/automatic. It has a functioning ControlNet and a ton of settings not seen in the classic branch. It's multiplatform right now and can be run in Rocm and DirectMl modes.

FrakerKill commented 1 year ago

I urge you all to try vladmandic/automatic. It has a functioning ControlNet and a ton of settings not seen in the classic branch. It's multiplatform right now and can be run in Rocm and DirectMl modes.

But supports AMD GPUs?

Miraihi commented 1 year ago

I repeat, it's multiplatform and can be run in Rocm and DirectMl modes. That implies that AMD cards are supported. Using it right now myself.

Grathew commented 1 year ago

I am getting this error when doing image generation. It seems like memory isn't being released after image creation.