Device type privateuseone is not supported for torch.Generator() api

JarekDerp commented 8 months ago

Describe the problem I have AMD card, 6700 XT and I'm running on Windows 11. When trying to generate an image I get "Device type privateuseone is not supported for torch.Generator() api. "In the console log below, in line 11, it says that it recognized my device as "Device: privateuseone" and I think that might be the issue.

script "brownian_interval.py" checks in line 52 if the "device is none" and then assigns "device = torch.device("cpu")" but it's not working since it's recognizing some device that it shouldn't recognize.

I think the problem is in the Fooocus/backend/headless/fcbh/model_management.py script, either lines 69(nice):83 or 241:259. Also, there's an issue with calculating available VRAM. My card has 12GB of VRAM but it's reported in the console log below that I have 1024 MB, probably caused by line 95 that says "mem_total = 1024 1024 1024 #TODO"

Any chance of getting it fixed? I have no idea about python so I can't do much :/

Full Console Log [System ARGV] ['E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\launch.py', '--preset', 'realistic', '--normalvram', '--directml', '--disable-xformers', '--auto-launch'] Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] Fooocus version: 2.1.820 Running on local URL: http://127.0.0.1:7865

To create a public link, set share=True in launch(). Using directml with device: Total VRAM 1024 MB, total RAM 32637 MB Set vram state to: NORMAL_VRAM Disabling smart memory management Device: privateuseone VAE dtype: torch.float32 Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --use-split-cross-attention Refiner unloaded. model_type EPS adm 2816 Using split attention in VAE Working with z of shape (1, 4, 32, 32) = 4096 dimensions. Using split attention in VAE extra keys {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale'} Base model loaded: E:\StabilityMatrix-win-x64\Data\Models\StableDiffusion\realisticStockPhoto_v10.safetensors Request to load LoRAs [['SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors', 0.25], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [E:\StabilityMatrix-win-x64\Data\Models\StableDiffusion\realisticStockPhoto_v10.safetensors]. Loaded LoRA [E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\models\loras\SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors] for model [E:\StabilityMatrix-win-x64\Data\Models\StableDiffusion\realisticStockPhoto_v10.safetensors] with 1052 keys at weight 0.25. Fooocus V2 Expansion: Vocab with 642 words. Fooocus Expansion engine loaded for cpu, use_fp16 = False. Requested to load SDXLClipModel Requested to load GPT2LMHeadModel Loading 2 new models [Fooocus Model Management] Moving model(s) has taken 2.07 seconds App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865 [Parameters] Adaptive CFG = 7 [Parameters] Sharpness = 2 [Parameters] ADM Scale = 1.5 : 0.8 : 0.3 [Parameters] CFG = 3.0 [Parameters] Seed = 7948768698594532830 [Parameters] Sampler = dpmpp_2m_sde_gpu - karras [Parameters] Steps = 30 - 15 [Fooocus] Initializing ... [Fooocus] Loading models ... Refiner unloaded. Request to load LoRAs [('SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors', 0.25), ('None', 1), ('None', 1), ('None', 1), ('None', 1)] for model [E:\StabilityMatrix-win-x64\Data\Models\StableDiffusion\realisticStockPhoto_v10.safetensors]. Loaded LoRA [E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\models\loras\SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors] for model [E:\StabilityMatrix-win-x64\Data\Models\StableDiffusion\realisticStockPhoto_v10.safetensors] with 1052 keys at weight 0.25. Requested to load SDXLClipModel Loading 1 new model unload clone 1 [Fooocus Model Management] Moving model(s) has taken 1.95 seconds [Fooocus] Processing prompts ... [Fooocus] Preparing Fooocus text #1 ... [Prompt Expansion] xxxxxxxxx [Fooocus] Preparing Fooocus text #2 ... [Prompt Expansion] xxxxxxxxx [Fooocus] Encoding positive #1 ... [Fooocus] Encoding positive #2 ... [Fooocus] Encoding negative #1 ... [Fooocus] Encoding negative #2 ... Preparation time: 15.39 seconds [Sampler] refiner_swap_method = joint [Sampler] sigma_min = 0.02916753850877285, sigma_max = 14.614643096923828 Traceback (most recent call last): File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\modules\async_worker.py", line 733, in worker handler(task) File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\modules\async_worker.py", line 665, in handler imgs = pipeline.process_diffusion( File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context Total time: 77.84 seconds return func(*args, kwargs) File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\modules\default_pipeline.py", line 312, in process_diffusion modules.patch.BrownianTreeNoiseSamplerPatched.global_init( File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\modules\patch.py", line 169, in global_init BrownianTreeNoiseSamplerPatched.tree = BatchedBrownianTree(x, t0, t1, seed, cpu=cpu) File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\backend\headless\fcbh\k_diffusion\sampling.py", line 85, in init self.trees = [torchsde.BrownianTree(t0, w0, t1, entropy=s, kwargs) for s in seed] File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\backend\headless\fcbh\k_diffusion\sampling.py", line 85, in self.trees = [torchsde.BrownianTree(t0, w0, t1, entropy=s, *kwargs) for s in seed] File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\venv\lib\site-packages\torchsde_brownian\derived.py", line 155, in init self._interval = brownian_interval.BrownianInterval(t0=t0, File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\venv\lib\site-packages\torchsde_brownian\brownian_interval.py", line 554, in init W = self._randn(initial_W_seed) math.sqrt(t1 - t0) File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\venv\lib\site-packages\torchsde_brownian\brownian_interval.py", line 248, in _randn return _randn(size, self._top._dtype, self._top._device, seed) File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\venv\lib\site-packages\torchsde_brownian\brownian_interval.py", line 31, in _randn generator = torch.Generator(device).manual_seed(int(seed)) RuntimeError: Device type privateuseone is not supported for torch.Generator() api.

MorningKek commented 8 months ago

I have an AMD Radeon RX 6600, using Fooocus on Windows 10. Same issue.

RuntimeError: Device type privateuseone is not supported for torch.Generator() api.

llnqdx commented 8 months ago

I have an AMD Radeon RX 7800xt, using Fooocus on Windows 10. Same issue.

RuntimeError: Device type privateuseone is not supported for torch.Generator() api.

sappelhoff commented 8 months ago

Same problem for me:

AMD Radeon RX 5700
Windows 11

RuntimeError: Device type privateuseone is not supported for torch.Generator() api.

JarekDerp commented 8 months ago

Thanks for confirming that I'm not the only one with the problem. I was thinking that maybe I have something wrong with my PC configuration but looks like it's not an isolated case. People who has it installed for month maybe don't have this issue. I made a fresh install so maybe one of the dependencies has updated and stopped working or something. I have installed ComfyUI with DirectML in a separate environment and it's working fine, even though it's using the same packages as Fooocus.

sappelhoff commented 8 months ago

What were the exact steps you used to solve the problem @JarekDerp? Just a "fresh install" worked?

download via the link here: https://github.com/lllyasviel/Fooocus#windows
edit run.bat as described here: https://github.com/lllyasviel/Fooocus#windowsamd-gpus
run run.bat

... and that's it? Or were there other steps you did?

JarekDerp commented 8 months ago

@sappelhoff Sorry, I was a bit unclear on what I wanted to say. I modified my previous comment. Basically, what I meant was that I made a fresh installation a couple of days ago and it's not working. Maybe there are people who installed it a couple of months ago and it's working for them because they didn't upgraded any pip packages.

Making a fresh installation of ComfyUI works fine while this one doesn't work. I'm not good with python but I'll try to compare the packages and see which one has different versions.

MikeLP commented 8 months ago

As a temporary solution manually patch file "./python_embeded/Lib/site-packages/torchsde/_brownian/brownian_interval.py"

Find (31 line)

def _randn(size, dtype, device, seed):
    generator = torch.Generator(device).manual_seed(int(seed))
    return torch.randn(size, dtype=dtype, device=device, generator=generator)

and change to

def _randn(size, dtype, device, seed):
    generator = torch.Generator("cpu").manual_seed(int(seed))
    return torch.randn(size, dtype=dtype, device=device, generator=generator)

MikeLP commented 8 months ago

One more possible issue in installation (which I had) could be this one

\python_embeded\python.exe -m pip uninstall torch torchvision torchaudio torchtext functorch xformers -y
.\python_embeded\python.exe -m pip install torch-directml
.\python_embeded\python.exe -s Fooocus\entry_with_update.py --directml
pause

Fix - add a missing dot.

.\python_embeded\python.exe -m pip uninstall torch torchvision torchaudio torchtext functorch xformers -y
.\python_embeded\python.exe -m pip install torch-directml
.\python_embeded\python.exe -s Fooocus\entry_with_update.py --directml
pause

MorningKek commented 8 months ago

As a temporary solution manually patch file "./python_embeded/Lib/site-packages/torchsde/_brownian/brownian_interval.py"

Find (31 line)

def _randn(size, dtype, device, seed):
    generator = torch.Generator(device).manual_seed(int(seed))
    return torch.randn(size, dtype=dtype, device=device, generator=generator)

and change to

def _randn(size, dtype, device, seed):
    generator = torch.Generator("cpu").manual_seed(int(seed))
    return torch.randn(size, dtype=dtype, device=device, generator=generator)

Did that, a new error appears now.

Enter LCM mode. [Fooocus] Downloading LCM components ... [Parameters] Adaptive CFG = 1.0 [Parameters] Sharpness = 0.0 [Parameters] ADM Scale = 1.0 : 1.0 : 0.0 [Parameters] CFG = 1.0 [Parameters] Seed = 1228787800952501620 [Parameters] Sampler = lcm - lcm [Parameters] Steps = 8 - 8 [Fooocus] Initializing ... [Fooocus] Loading models ... Refiner unloaded. [Fooocus] Processing prompts ... [Fooocus] Preparing Fooocus text #1 ... [Prompt Expansion] banana, highly detailed, vibrant colors, light, strong crisp, sharp focus, intricate, cinematic, full background, excellent composition, dynamic dramatic futuristic atmosphere, precise, aesthetic, very inspirational, stunning, rich vivid color, ambient epic, professional fine detail, clear, beautiful, creative, positive, attractive, unique, cute, artistic, wonderful, perfect, focused, confident [Fooocus] Encoding positive #1 ... [Parameters] Denoising Strength = 1.0 [Parameters] Initial Latent shape: Image Space (1024, 1024) Preparation time: 5.17 seconds Using lcm scheduler. [Sampler] refiner_swap_method = joint [Sampler] sigma_min = 0.39970144629478455, sigma_max = 14.614640235900879 Requested to load SDXL Loading 1 new model ERROR diffusion_model.output_blocks.0.0.in_layers.2.weight Could not allocate tensor with 117964800 bytes. There is not enough GPU video memory available! ERROR diffusion_model.output_blocks.0.0.in_layers.2.weight Could not allocate tensor with 117964800 bytes. There is not enough GPU video memory available! Traceback (most recent call last): File "D:\Downloads\Fooocus_win64_2-1-791\Fooocus\modules\async_worker.py", line 803, in worker handler(task) File "D:\Downloads\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "D:\Downloads\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "D:\Downloads\Fooocus_win64_2-1-791\Fooocus\modules\async_worker.py", line 735, in handler imgs = pipeline.process_diffusion( File "D:\Downloads\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "D:\Downloads\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "D:\Downloads\Fooocus_win64_2-1-791\Fooocus\modules\default_pipeline.py", line 361, in process_diffusion sampled_latent = core.ksampler( File "D:\Downloads\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "D:\Downloads\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "D:\Downloads\Fooocus_win64_2-1-791\Fooocus\modules\core.py", line 315, in ksampler samples = fcbh.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, File "D:\Downloads\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\sample.py", line 93, in sample real_model, positive_copy, negative_copy, noise_mask, models = prepare_sampling(model, noise.shape, positive, negative, noise_mask) File "D:\Downloads\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\sample.py", line 86, in prepare_sampling fcbh.model_management.load_models_gpu([model] + models, model.memory_required(noise_shape) + inference_memory) File "D:\Downloads\Fooocus_win64_2-1-791\Fooocus\modules\patch.py", line 496, in patched_load_models_gpu y = fcbh.model_management.load_models_gpu_origin(*args, **kwargs) File "D:\Downloads\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\model_management.py", line 410, in load_models_gpu cur_loaded_model = loaded_model.model_load(lowvram_model_memory) File "D:\Downloads\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\model_management.py", line 293, in model_load raise e File "D:\Downloads\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\model_management.py", line 289, in model_load self.real_model = self.model.patch_model(device_to=patch_model_to) #TODO: do something with loras and offloading to CPU File "D:\Downloads\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\model_patcher.py", line 191, in patch_model temp_weight = fcbh.model_management.cast_to_device(weight, device_to, torch.float32, copy=True) File "D:\Downloads\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\model_management.py", line 532, in cast_to_device return tensor.to(device, copy=copy).to(dtype) RuntimeError: Could not allocate tensor with 117964800 bytes. There is not enough GPU video memory available! Total time: 9.84 seconds

JarekDerp commented 8 months ago

Could not allocate tensor with 117964800 bytes. There is not enough GPU video memory available!

Yeah, I already tried that days ago and had the same result. My card has 12GB so 112MB shouldn't be a problem. Forcing it to be CPU in this one place doesn't fix the whole script. I think the problem is finding the GPU card you have, it's address or hardware id or whatever, and then making it available for image generation.

I tried a couple of things (I'm a complete python noob btw) and in some instances I got error saying that part of the work was assigned to GPU and part to CPU and it had some problems with tensors, so it's way above my head.

JarekDerp commented 8 months ago

I run ComfyUI and the file "model_management.py" look nearly identical, it's a file I suspected that's wrong. The output in the beginning's the same:

Using directml with device: Total VRAM 1024 MB, total RAM 32637 MB Set vram state to: NORMAL_VRAM Device: privateuseone

But ComfyUI works and this one doesn't. Even the installed packages are almost the same, only torchsde is a different version.

Fooocus torch torch_directml torch_directml_native.cp310-win_amd64.pyd torch_directml-0.2.0.dev230426.dist-info torch-2.0.0.dist-info torchgen torchmetrics torchmetrics-1.2.0.dist-info torchsde torchsde-0.2.5.dist-info torchvision torchvision-0.15.1.dist-info

Comfy torch torch_directml torch_directml_native.cp310-win_amd64.pyd torch_directml-0.2.0.dev230426.dist-info torch-2.0.0.dist-info torchgen torchsde torchsde-0.2.6.dist-info torchvision torchvision-0.15.1.dist-info

But even after running pip install --upgrade torchsde==0.2.6 it doesn't work. I search the entire solutions and the implementation of directml is nearly identical.

Also tried running it with parameters 'E:\\StabilityMatrix-win-x64\\Data\\Packages\\Fooocus\\launch.py', '--preset', 'realistic', '--disable-xformers', '--cpu' and it run fine, although it took 55s/it on my Ryzen 5 5500 CPU and took 29-31 out of 32GB of RAM on my system.

I have one more suspicion. ComfyUI doesn't give any messages when rendering 512x512 pictures, but when I selected 896×1152 like Fooocus likes to use, it started complaining a lot and then decided to do it anyway. Although it took about 5x longer than a regular 512x512 image (speed 5.5s/it instead of 1s/it). I don't know how to inject a 512x512 image resolution into Fooocus to test if it would work with this 1:1 aspect ratio.

JarekDerp commented 8 months ago

Well, as I was typing my previous comment, ComfyUI gave me the same error!

Error occurred when executing VAEDecode: Could not allocate tensor with 264241152 bytes. There is not enough GPU video memory available!

The weird thing is that KSampler generated the image but the VAE Decode node failed to display it. Which only confirms my theory about unregular/too large image sizes fails on AMD and torch-directml.

Lira2423 commented 8 months ago

If you(like I do) just want to run model on cpu, change func(line 90) in file ...\Fooocus\launch.py into

def ini_fcbh_args():
    from args_manager import args
    args.cpu = True
    return args

Unfortunately I dont know how to make amd gpu work :(

JarekDerp commented 8 months ago

I managed to run it on a 6700XT GPU, it was quite slow, 3-4s/it when generating 512x512 image.

But it only generates 1-2 images and then stops working due to lack of VRAM because it's doing poor job of clearing the VRAM after each run. Even setting --normalvram or -lowvram or even -novram doesn't work. It either fills up your entire VRAM and then fails to run, or ignores your config and tries to allocate work to CUDA.

This is rubbish. I'm uninstalling it and I will be using ComfyUI instead. Not worth my time.

JarekDerp commented 7 months ago

In some specific situations, the same appears in ComfyUI. I'm running into this problem as Torch DirectML is reserving almost all of the VRAM in the GPU when it starts. So then when you are trying to run encoder/decoder then it gives you error saying it cannot allocate enough memory in the VRAM because all of it is reserved ('reserved', not necessarily 'used'... So even if you load up a checkpoint that is only 2GB and you have 12GB card then it's still reserving like 97% of the card's VRAM) for the checkpoints and Loras. I'm looking into the problem and trying to find a solution but I'm not a pro and don't even know python so I probably won't be able to figure it out.

mashb1t commented 6 months ago

Duplicate of https://github.com/lllyasviel/Fooocus/issues/763. Please be aware that in https://github.com/lllyasviel/Fooocus/commit/8e62a72a63b30a3067d1a1bc3f8d226824bd9283 (latest Fooocus version 2.1.857) AMD with >= 8GB VRAM is now supported. Please try with min. 8GB VRAM allocated.

JarekDerp commented 6 months ago

Duplicate of #763. Please be aware that in 8e62a72 (latest Fooocus version 2.1.857) AMD with >= 8GB VRAM is now supported. Please try with min. 8GB VRAM allocated.

Wow, nice. Works quite well on the "Extra Speed" setting. Thanks for the hard work. I would just mention somewhere that you'd still need about 32GB of RAM to run it in DirectML mode.

mashb1t commented 6 months ago

@JarekDerp as of https://github.com/lllyasviel/Fooocus?tab=readme-ov-file#minimal-requirement you should only need 8GB RAM. Is the resource consumption of Fooocus significantly off on your machine?

JarekDerp commented 6 months ago

@mashb1t well, yes. It's using as much RAM as if I was running it on CPU, even though I'm running it on 6700xt that has 12gb VRAM. I have 32GB of RAM and it gets filled up almost completely when running image generation. One time I even noticed memory thrashing -where Windows saves some stuff into virtual memory on my SSD because it run out of RAM.

Basically it's using up my 32gb of RAM and 12gb of VRAM. But at least the image generation is quite fast and it doesn't give me "out of memory" errors anymore. Since posting the initial question I learned a lot about python, Directml, pytorch and stable diffusion in general. I managed to avoid these problems in Comfyui by using tiled decoder but it still fails sometimes with bigger images so I'm curious how you managed to make it work here. I'll probably have a look at the code once I have some spare time.

BTW, I can paste you the content of the log, maybe I have something wrong with my settings. I have a feeling that it's loading the models multiple times or something. I tested many things - image generations, then interrupting it when noticed I have some wrong settings, restarting it again, then trying inpainting and outpainting, image variations and so on.

mashb1t commented 6 months ago

@JarekDerp thank you for the analysis and insights. It would be great if you could provide the terminal output with reference to your issue comment in https://github.com/lllyasviel/Fooocus/issues/1690, so this issue doesn't drift even more off-topic. Much appreciated!

lllyasviel / Fooocus

Device type privateuseone is not supported for torch.Generator() api #970