Open JeSuisSurGithub opened 5 months ago
yes if you use directml you need to install torch_directml manually we will add some instructions later after we tested that directml is working. please come back in one or two days
I got it working; installed torch_directml manually, but also had to add "args.skip_torch_cuda_test = True" inside prepare_environment() in modules/launch_utils.py, since the startup was not actually recognizing the flag "--skip-torch-cuda-test" (even though it was recommending it) Need to set seed generation to CPU on the web interface, torch.Generator in rng.py was crashing on directml
FreeU crashes on directml. It's the fastfouriertransform (fft) that doesn't exist for it. If I recall correctly, I fixed it manually by casting "hsp = Fourier_filter(hsp, threshold=1, scale=scale[1])" to cpu "hsp = Fourier_filter(hsp.to("cpu"), threshold=1, scale=scale[1])" and back to device when returning: "return h, hsp.to(device)", on ComfyUI. Looks like it should be the same here.
I can confirm it does indeed work after installing torch_directml manually
how can we install it on prepackaged version ?
I'm not sure but i think you can just do the same thing:
venv/Scripts/activate.bat
or .ps1 if you're on powershelltorch-directml
And you'll be done.
But right know DirectML doesn't seem quite ready at least for me it doesn't detect the correct amount of VRAM, it says 1024MB even tho i have 2048MB+1024MB shared igpu vram
So it becomes excrutiatingly slow (with almost default settings just --skip-torch-cuda-test --directml
)
Compared to base sd-directml with --skip-torch-cuda-test --use-directml --skip-torch-cuda-test --medvram --opt-sub-quad-attention --sub-quad-q-chunk-size 512 --sub-quad-kv-chunk-size 512 --sub-quad-chunk-threshold 80 --no-half --no-half-vae --precision autocast --disable-nan-check
Ok, solved most of the problems but ...
add these to cmdline arguments in webui-user.bat : --directml --skip-torch-cuda-test --always-normal-vram --skip-version-check (always-normal-vram , because when it sees 1024mb it falls down to lowvram automatically that's why gen is so slow ... )
To install torch-directml in portable mode : open cmd , go into webui-forge directory, run ".\system\python\python.exe -m pip install torch-directml"
change "Random number generator source" to CPU in settings, under stable diffusion , stable diffusion (this works with gpu on fooocus on amd)
Now with these settings I am getting the same speeds I get from fooocus , the only problem is ... when I use sdxl and when it is time to decode it is almost always out of memory and app goes to tiled-vae but .. it gives this error : "RuntimeError: Cannot set version_counter for inference tensor" I see that this is a common error on directml but on the other hand I can run tiledvae on fooocus without problems.
I mentioned fooocus because it is also dev's work and maybe he can do same tricks there with gpu seeds and tiledvae's etc.
Overall it is as fast as fooocus for sdxl (normal sd stuff is well.. normal as always) for me on amd and have the extra options available to tinker , very good job, keep up the good work !
I still have problems with the performance even with --always-normal-vram
, it still loads the model using lowvram mode so it's 3x slower than base webui-dml
i guess it's just not ready yet https://github.com/lllyasviel/stable-diffusion-webui-forge/blob/257ac2653a565672b280f2851f37b1ba6e546548/ldm_patched/modules/model_management.py#L98
To load target model BaseModel
Begin to load 1 model
loading in lowvram mode 64.0
Moving model(s) has taken 11.27 seconds
Forge has high disk usage, higher memory usage and the gpu almost idling
Base DML has the gpu very busy
with the last few changes app stopped working on directml all together ... everytime I try any model, setting etc I am getting these: "TypeError: 'NoneType' object is not iterable"
+1 "TypeError: 'NoneType' object is not iterable"
rx580 1) webui-user.bat --directml --skip-torch-cuda-test --always-normal-vram --skip-version-check 2) .\system\python\python.exe -m pip install torch-directml 3) args.skip_torch_cuda_test = True in modules/launch_utils.py I don't know where exactly to insert args.skip_torch_cuda_test = True in step 3. Who can tell me?
AttributeError: module 'torch' has no attribute 'Tensor'
rx6600m
darthalex2014, as I understand.
Folder location:\webui_forge\webui\modules\launch_utils.py
can anyone tell me how to do this step:
.\system\python\python.exe -m pip install torch-directml
can anyone tell me how to do this step:
.\system\python\python.exe -m pip install torch-directml
(Hey Squid!) Open a command prompt from forge's folder and run;
venv\scripts\activate
pip install torch-directml
can anyone tell me how to do this step:
.\system\python\python.exe -m pip install torch-directml
(Hey Squid!) Open a command prompt from forge's folder and run;
venv\scripts\activate
pip install torch-directml
Thanks that worked!
rx6600m darthalex2014, as I understand. Folder location:\webui_forge\webui\modules\launch_utils.py
I know it's there, but what do I do with it? I get AttributeError: module 'torch' has no attribute 'Tensor'
+1 "TypeError: 'NoneType' object is not iterable"
+2 "TypeError: 'NoneType' object is not iterable"
+1 "TypeError: 'NoneType' object is not iterable"
+2 "TypeError: 'NoneType' object is not iterable"
Ditto I've tried a checkout from about a week ago, and this wasn't an issue then. I'm not clever enough to find the precise difference, but maybe there's a clue.
RuntimeError: Device type privateuseone is not supported for torch.Generator() api. RuntimeError: 'devices' argument must be DML (in cmd)
i did all steps but I have this error, I have no idea why
+3 "TypeError: 'NoneType' object is not iterable"
+4 "TypeError: 'NoneType' object is not iterable"
+5 "TypeError: 'NoneType' object is not iterable"
Yes, recent changes broke more things for Directml, I've mentioned them, and a workaround for us, here: https://github.com/lllyasviel/stable-diffusion-webui-forge/issues/73
Yes, recent changes broke more things for Directml, I've mentioned them, and a workaround for us, here: #73
That works for ipadapter I was the original op on ipadapter github and still changing that line on every update, how does it help here ? Are there similar lines to change on forge itself ?
RuntimeError: Device type privateuseone is not supported for torch.Generator() api. RuntimeError: 'devices' argument must be DML (in cmd)
i did all steps but I have this error, I have no idea why
Sounds like you forgot to change seed generation to cpu on the settings, torch.Generator() crashes if it tries that with Directml.
This is the config: (it should be the only that shows if you type "cpu" on the search box of the Settings Tab) Random number generator source. (changes seeds drastically; use CPU to produce the same picture across different videocard vendors; use NV to produce same picture as on NVidia videocards) []GPU [o]CPU []NV
Yes, recent changes broke more things for Directml, I've mentioned them, and a workaround for us, here: #73
That works for ipadapter I was the original op on ipadapter github and still changing that line on every update, how does it help here ? Are there similar lines to change on forge itself ?
Read the rest of my msgs, the last one mentions all the files/lines you need to change. Well, I'll detail them again here anyway...
comment out all the @torch.inference_mode() (add # before them) on: \ldm_patched\modules\utils.py - line 407 \modules_forge\forge_loader.py - line 236, line 242
change "with torch.inference_mode():" for "with torch.no_grad():" on: \modules\processing.py - line 817
This fixes normal execution and Tiled Vae Decoding for us. I'm running it with similar performance I get on ComfyUI, which is the fastest It has worked on my rx580 8Gb (windows), despite it saying "low_vram mode" here.
Yes, recent changes broke more things for Directml, I've mentioned them, and a workaround for us, here: #73
That works for ipadapter I was the original op on ipadapter github and still changing that line on every update, how does it help here ? Are there similar lines to change on forge itself ?
Read the rest of my msgs, the last one mentions all the files/lines you need to change. Well, I'll detail them again here anyway...
comment out all the @torch.inference_mode() (add # before them) on: \ldm_patched\modules\utils.py - line 407 \modules_forge\forge_loader.py - line 236, line 242
change "with torch.inference_mode():" for "with torch.no_grad():" on: \modules\processing.py - line 817
This fixes normal execution and Tiled Vae Decoding for us. I'm running it with similar performance I get on ComfyUI, which is the fastest It has worked on my rx580 8Gb (windows), despite it saying "low_vram mode" here.
HOLY ... Just changed all those files, and regarding tiled-vae , yes after using it on my workflows no more out of memory again even on sdxl.
Edit : Yes sd1.5 and sdxl working, tiled-vae auto enabling and working also. But the speed seems identical to comfyui and fooocus.
Edit 2: Talked too early, sd1.5 models and sdxl turbo worked, when I tried a standard sdxl (juggernaught8) none type error came back again. If I restart PC , I can load and use full sdxl models without error, only if I change from sd1.5 or sdxl turbo to sdxl none... error returns and persists until pc restarts.
HOLY ... Just changed all those files, and regarding tiled-vae , yes after using it on my workflows no more out of memory again even on sdxl.
Edit : Yes sd1.5 and sdxl working, tiled-vae auto enabling and working also. But the speed seems identical to comfyui and fooocus.
Edit 2: Talked too early, sd1.5 models and sdxl turbo worked, when I tried a standard sdxl (juggernaught8) none type error came back again. If I restart PC , I can load and use full sdxl models without error, only if I change from sd1.5 or sdxl turbo to sdxl none... error returns and persists until pc restarts.
Switching form SD 1.5 to SDXL or wise versa gives TypeError: 'NoneType' object is not iterable However switching same models 1.5 to 1.5 works fine.
@MythicalChu i've edited those files but i still get the same error ("TypeError: 'NoneType' object is not iterable")
@MythicalChu i've edited those files but i still get the same error ("TypeError: 'NoneType' object is not iterable")
Search in Settings CPU and set from GPU to CPU and restart PC
i forgot to change seed generation to cpu, it works now thanks
Textual inversions seem to just straight up brick the process
RuntimeError: Device type privateuseone is not supported for torch.Generator() api. RuntimeError: 'devices' argument must be DML (in cmd) i did all steps but I have this error, I have no idea why
Sounds like you forgot to change seed generation to cpu on the settings, torch.Generator() crashes if it tries that with Directml.
This is the config: (it should be the only that shows if you type "cpu" on the search box of the Settings Tab) Random number generator source. (changes seeds drastically; use CPU to produce the same picture across different videocard vendors; use NV to produce same picture as on NVidia videocards) []GPU [o]CPU []NV
still not working now i have TypeError: 'NoneType' object is not iterable
RuntimeError: Device type privateuseone is not supported for torch.Generator() api. RuntimeError: 'devices' argument must be DML (in cmd) i did all steps but I have this error, I have no idea why
Sounds like you forgot to change seed generation to cpu on the settings, torch.Generator() crashes if it tries that with Directml. This is the config: (it should be the only that shows if you type "cpu" on the search box of the Settings Tab) Random number generator source. (changes seeds drastically; use CPU to produce the same picture across different videocard vendors; use NV to produce same picture as on NVidia videocards) []GPU [o]CPU []NV
still not working now i have TypeError: 'NoneType' object is not iterable
If you switch form SD to SDTurbo or SD to SDXL or SDXL to other..you will get this error. To fix...restart PC. But do not change to other types. You can change 1.5 to 1.5 model however changing models like 1.5 to 2.0 or SDXL or turbo will brick it. So keep that in mind. You are safe to use Loras, But when changing Models makes sure you know what model you are switching. My suggestion same models can be in sub directory as "SD 1.5" that is base model And SDXL for other Like "SD1.5/Dreamshaper.safetensors" is folder is SD1.5...model is dreamshaper. You can orginize better folder and models this way. I hope it helps.
RuntimeError: Device type privateuseone is not supported for torch.Generator() api. RuntimeError: 'devices' argument must be DML (in cmd) i did all steps but I have this error, I have no idea why
Sounds like you forgot to change seed generation to cpu on the settings, torch.Generator() crashes if it tries that with Directml. This is the config: (it should be the only that shows if you type "cpu" on the search box of the Settings Tab) Random number generator source. (changes seeds drastically; use CPU to produce the same picture across different videocard vendors; use NV to produce same picture as on NVidia videocards) []GPU [o]CPU []NV
still not working now i have TypeError: 'NoneType' object is not iterable
If you switch form SD to SDTurbo or SD to SDXL or SDXL to other..you will get this error. To fix...restart PC. But do not change to other types. You can change 1.5 to 1.5 model however changing models like 1.5 to 2.0 or SDXL or turbo will brick it. So keep that in mind. You are safe to use Loras, But when changing Models makes sure you know what model you are switching. My suggestion same models can be in sub directory as "SD 1.5" that is base model And SDXL for other Like "SD1.5/Dreamshaper.safetensors" is folder is SD1.5...model is dreamshaper. You can orginize better folder and models this way. I hope it helps.
the best part about it i used 1.5 model from the beginning and restarting pc didnt help
RuntimeError: Device type privateuseone is not supported for torch.Generator() api. RuntimeError: 'devices' argument must be DML (in cmd) i did all steps but I have this error, I have no idea why
Sounds like you forgot to change seed generation to cpu on the settings, torch.Generator() crashes if it tries that with Directml. This is the config: (it should be the only that shows if you type "cpu" on the search box of the Settings Tab) Random number generator source. (changes seeds drastically; use CPU to produce the same picture across different videocard vendors; use NV to produce same picture as on NVidia videocards) []GPU [o]CPU []NV
still not working now i have TypeError: 'NoneType' object is not iterable
If you switch form SD to SDTurbo or SD to SDXL or SDXL to other..you will get this error. To fix...restart PC. But do not change to other types. You can change 1.5 to 1.5 model however changing models like 1.5 to 2.0 or SDXL or turbo will brick it. So keep that in mind. You are safe to use Loras, But when changing Models makes sure you know what model you are switching. My suggestion same models can be in sub directory as "SD 1.5" that is base model And SDXL for other Like "SD1.5/Dreamshaper.safetensors" is folder is SD1.5...model is dreamshaper. You can orginize better folder and models this way. I hope it helps.
the best part about it i used 1.5 model from the beginning and restarting pc didnt help
edit these files? And did you set Random Number Generator Form GPU to CPU? comment out all the @torch.inference_mode() (add # before them) on: \ldm_patched\modules\utils.py - line 407 \modules_forge\forge_loader.py - line 236, line 242
change "with torch.inference_mode():" for "with torch.no_grad():" on: \modules\processing.py - line 817
RuntimeError: Device type privateuseone is not supported for torch.Generator() api. RuntimeError: 'devices' argument must be DML (in cmd) i did all steps but I have this error, I have no idea why
Sounds like you forgot to change seed generation to cpu on the settings, torch.Generator() crashes if it tries that with Directml. This is the config: (it should be the only that shows if you type "cpu" on the search box of the Settings Tab) Random number generator source. (changes seeds drastically; use CPU to produce the same picture across different videocard vendors; use NV to produce same picture as on NVidia videocards) []GPU [o]CPU []NV
still not working now i have TypeError: 'NoneType' object is not iterable
If you switch form SD to SDTurbo or SD to SDXL or SDXL to other..you will get this error. To fix...restart PC. But do not change to other types. You can change 1.5 to 1.5 model however changing models like 1.5 to 2.0 or SDXL or turbo will brick it. So keep that in mind. You are safe to use Loras, But when changing Models makes sure you know what model you are switching. My suggestion same models can be in sub directory as "SD 1.5" that is base model And SDXL for other Like "SD1.5/Dreamshaper.safetensors" is folder is SD1.5...model is dreamshaper. You can orginize better folder and models this way. I hope it helps.
the best part about it i used 1.5 model from the beginning and restarting pc didnt help
edit these files? And did you set Random Number Generator Form GPU to CPU? comment out all the @torch.inference_mode() (add # before them) on: \ldm_patched\modules\utils.py - line 407 \modules_forge\forge_loader.py - line 236, line 242
change "with torch.inference_mode():" for "with torch.no_grad():" on: \modules\processing.py - line 817
its working now ty
@VeteranXT do yo know why i'm gettibg this result when i use sdxl model ?
I have also tried to follow the procedure but it does not work. Have I done something fundamentally wrong...?
When I try to generate it, I get "TypeError: 'NoneType' object is not iterable".
@Texieru
See instructions from MythicalChu earlier in the thread. I was having the same issue earlier today, followed their method and it got things working for me. Some of the line numbers were a little different, but in roughly the same area.
Yes, recent changes broke more things for Directml, I've mentioned them, and a workaround for us, here: #73
That works for ipadapter I was the original op on ipadapter github and still changing that line on every update, how does it help here ? Are there similar lines to change on forge itself ?
Read the rest of my msgs, the last one mentions all the files/lines you need to change. Well, I'll detail them again here anyway...
comment out all the @torch.inference_mode() (add # before them) on: \ldm_patched\modules\utils.py - line 407 \modules_forge\forge_loader.py - line 236, line 242
change "with torch.inference_mode():" for "with torch.no_grad():" on: \modules\processing.py - line 817
This fixes normal execution and Tiled Vae Decoding for us. I'm running it with similar performance I get on ComfyUI, which is the fastest It has worked on my rx580 8Gb (windows), despite it saying "low_vram mode" here.
Do yo know why i'm gettibg this result when i use sdxl model ?
Set CFG to 1-2 when using turbo or LCM models.
@usernamele31
はい、最近の変更により Directml でさらに多くのことが壊れました。それらについてはすでに述べましたが、私たちのための回避策はここにあります: #73
それは ipadapter で機能します。私は ipadapter github のオリジナルの操作であり、今でも更新のたびにその行を変更していますが、ここでどのように役立ちますか? forge 自体にも変更する同様の行はありますか?
私のメッセージの残りの部分を読んでください。最後のメッセージには、変更する必要があるすべてのファイル/行が記載されています。 まあ、いずれにせよ、ここでもう一度詳しく説明します... \ldm_patched\modules\utils.py - 407 行目 \modules_forge\forge_loader.py - 236 行目、242 行目上のすべての @torch.inference_mode() をコメントアウトします (その前に # を追加します)。 \modules\processing.py の「with torch.inference_mode():」を「with torch.no_grad():」に変更します - 817 行目 これにより、通常の実行と Tiled Vae デコーディングが修正されます。 ComfyUI で得られるのと同様のパフォーマンスで実行しています。これは、ここでは「low_vram モード」と表示されているにもかかわらず、私の rx580 8Gb (Windows) で動作しました。これは最速です。
Hmmm...it still doesn't work.
@Texieru
Did you change random number generation source from GPU to CPU in the webui settings? You should be able to find it by typing cpu in the settings. Sorry for not including that step!
I would like to add to the always having low vram with the "--always-normal-vram" flag on, despite having a 6800M that has 12GB vram.
Once you do set those, Restart PC. And if you do change models like Turbo to SDXL will throw error. Changing to same model aka: SD 1.5 to SD 1.5 or XL to XL will not throw error.
It still does not work well.
Maybe it doesn't work with a GTX770M on a notebook PC...?
It still does not work well. Maybe it doesn't work with a GTX770M on a notebook PC...?
![]()
Did you restart PC after you did following thing? Alter 3 Files? Set GPU to CPU in stable diffusion settings? Restart laptop?
@Texieru your problem seems different, since it's failing to load that checkpoint, maybe try a different checkpoint? If it still doesn't work, maybe it's related to your localization (jp), failing to find the folders or something... Added " --directml ^" to webui-user.bat ?
@VeteranXT do yo know why i'm gettibg this result when i use sdxl model ?
I get bad results like this if I use a Multisample scheduler, like DPM++ 2M, with SD1.5. Not sure if it's LCM related. The ones that seems to work better for me are Euler A and DPM++ SDE.
Do yo know why i'm gettibg this result when i use sdxl model ?
Set CFG to 1-2 when using turbo or LCM models.
I've adapted the CFGRescale of A1111 for Forge: https://github.com/MythicalChu/CFGRescale_For_Forge
Edit: I hadn't notice that CFG-fix was implemented on a recent update ._.'
@VeteranXT do yo know why i'm gettibg this result when i use sdxl model ?
I get bad results like this if I use a Multisample scheduler, like DPM++ 2M, with SD1.5. Not sure if it's LCM related. The ones that seems to work better for me are Euler A and DPM++ SDE.
some samplers are horrible with LCM(turbo)
@MythicalChu Change checkpoints and VAEs Disabling Japanese language extensions Add --directml ^. Reinstalling webui-forge I have tried all of these and get the same error.
Hmmm what is the cause?
Checklist
What happened?
Tried to launch webui through webui-user.bat and it can't. Seems like torch_directml doesn't get installed and that the program supposes that i'm using a cuda compatible gpu.
Steps to reproduce the problem
--directml --skip-torch-cuda-test
to COMMANDLINE_ARGS in webui-user.batWhat should have happened?
It should have launched the UI.
What browsers do you use to access the UI ?
Mozilla Firefox
Sysinfo
I couldn't get the sysinfo dump either
Additional information
The DirectML fork of a1111 webui works fine on my AMD Ryzen 3500U Vega 8 laptop APU (2x4GB 2400MHz).