lllyasviel / Fooocus

Focus on prompting and generating
GNU General Public License v3.0
40.53k stars 5.65k forks source link

AMD 6700XT 12GB DML allocator out of memory. #835

Open uzior opened 11 months ago

uzior commented 11 months ago

Hi! I realize that AMD devices are still in the beta phase, but the problem I encountered seems relatively easy to solve, namely:

Everything starts correctly, the generation process starts, but when it allocates VRAM memory after exceeding 7GB, the system crashes.

Any suggestions? I will be grateful for all suggestions. log below.

Already up-to-date Update succeeded. Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)] Fooocus version: 2.1.771 Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). Using directml with device: Total VRAM 1024 MB, total RAM 32694 MB Set vram state to: NORMAL_VRAM Device: privateuseone VAE dtype: torch.float32 Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --use-split-cross-attention [Fooocus] Disabling smart memory Refiner unloaded. model_type EPS adm 2816 Using split attention in VAE Working with z of shape (1, 4, 32, 32) = 4096 dimensions. Using split attention in VAE missing {'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection'} Base model loaded: E:\VM\AI\Fooocus\models\checkpoints\juggernautXL_version6Rundiffusion.safetensors LoRAs loaded: [('sd_xl_offset_example-lora_1.0.safetensors', 0.1), ('None', 0.1), ('None', 0.1), ('None', 0.1), ('None', 0.1)] Fooocus Expansion engine loaded for cpu, use_fp16 = False. Requested to load SDXLClipModel Requested to load GPT2LMHeadModel Loading 2 new models App started successful. Use the app with http://127.0.0.1:7860/ or 127.0.0.1:7860 [Parameters] Adaptive CFG = 7 [Parameters] Sharpness = 1 [Parameters] ADM Scale = 1.5 : 0.8 : 0.3 [Parameters] CFG = 4.0 [Parameters] Seed = 139978393905596030 [Parameters] Sampler = dpmpp_2m_sde_gpu - karras [Parameters] Steps = 30 - 24 [Fooocus] Initializing ... [Fooocus] Loading models ... Refiner unloaded. [Fooocus] Processing prompts ... [Fooocus] Encoding positive #1 ... [Fooocus] Encoding positive #2 ... [Fooocus] Encoding negative #1 ... [Fooocus] Encoding negative #2 ... Preparation time: 3.72 seconds [Sampler] refiner_swap_method = joint [Sampler] sigma_min = 0.02916753850877285, sigma_max = 14.614643096923828 Requested to load SDXL Loading 1 new model [W D:\a_work\1\s\pytorch-directml-plugin\torch_directml\csrc\engine\dml_heap_allocator.cc:120] DML allocator out of memory! [W D:\a_work\1\s\pytorch-directml-plugin\torch_directml\csrc\engine\dml_heap_allocator.cc:120] DML allocator out of memory!

f-klement commented 10 months ago

I have a similar problem on my RX 6800S, the laptop runs out of memory and crashes at every run

lukechar commented 10 months ago

Same issue on my RX 6650 XT - "DML allocator out of memory!"

PowerZones commented 10 months ago

Same on RX580 8gb

jhoyocartes commented 9 months ago

Same on RX580 8gb too

quoije commented 9 months ago

Same issue with AMD 6700XT.

EDIT: Not anymore, fix my issue by setting the page file to automatic and insure that I have space available on my disk. The same gpu was coincidental and probably not related to OP issue.

Windows 11

Crunch91 commented 9 months ago

Same here. My specs:

6700 XT (12gb) 16GB RAM 24576 MB swap file SSD Windows

DML allocator out of memory!
ptrkrnstnr commented 9 months ago

same here... 7900xtx (24GB), 7800x3d, 32 GB RAM, 26624 MB swapfile, SSD, Windows 10

TobiWan-Kenobi commented 8 months ago

Same issue here with Intel(R) UHD Graphics GPU 8GB ... Will I have any chance of running this with this kind of GPU at all? (Windows 11)

OdinM13 commented 8 months ago

I have the same issue while having 32gb ram and radeon rx6800xt. The strange thing is, that it worked properly before. For a few days I could generate as many pictures as I desired with all kinds of different settings, but now this is no longer possbile and I don't know why

ptrkrnstnr commented 8 months ago

I have a solution: go back to version 2.1.851, modify the "run.bat" to disable all updates like so:

:: .\python_embeded\python.exe -m pip uninstall torch torchvision torchaudio torchtext functorch xformers -y :: .\python_embeded\python.exe -m pip install torch-directml .\python_embeded\python.exe -s Fooocus\launch.py --directml pause

you can download the files of v 2.1.851 by selecting the right branch and just copy the files over a fresh installation.

lgwjames commented 8 months ago

I have a solution: go back to version 2.1.851, modify the "run.bat" to disable all updates like so:

:: .\python_embeded\python.exe -m pip uninstall torch torchvision torchaudio torchtext functorch xformers -y :: .\python_embeded\python.exe -m pip install torch-directml .\python_embeded\python.exe -s Fooocus\launch.py --directml pause

you can download the files of v 2.1.851 by selecting the right branch and just copy the files over a fresh installation.

Tried this today as installed on bootcamp with rx580 8gb

get this

Traceback (most recent call last): File "C:\Program Files\Fooocus\Fooocus\modules\async_worker.py", line 803, in worker handler(task) File "C:\Program Files\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "C:\Program Files\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "C:\Program Files\Fooocus\Fooocus\modules\async_worker.py", line 735, in handler imgs = pipeline.process_diffusion( File "C:\Program Files\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "C:\Program Files\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "C:\Program Files\Fooocus\Fooocus\modules\default_pipeline.py", line 361, in process_diffusion sampled_latent = core.ksampler( File "C:\Program Files\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "C:\Program Files\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "C:\Program Files\Fooocus\Fooocus\modules\core.py", line 313, in ksampler samples = ldm_patched.modules.sample.sample(model, File "C:\Program Files\Fooocus\Fooocus\ldm_patched\modules\sample.py", line 93, in sample real_model, positive_copy, negative_copy, noise_mask, models = prepare_sampling(model, noise.shape, positive, negative, noise_mask) File "C:\Program Files\Fooocus\Fooocus\ldm_patched\modules\sample.py", line 86, in prepare_sampling ldm_patched.modules.model_management.load_models_gpu([model] + models, model.memory_required([noise_shape[0] 2] + list(noise_shape[1:])) + inference_memory) File "C:\Program Files\Fooocus\Fooocus\modules\patch.py", line 441, in patched_load_models_gpu y = ldm_patched.modules.model_management.load_models_gpu_origin(args, **kwargs) File "C:\Program Files\Fooocus\Fooocus\ldm_patched\modules\model_management.py", line 414, in load_models_gpu cur_loaded_model = loaded_model.model_load(lowvram_model_memory) File "C:\Program Files\Fooocus\Fooocus\ldm_patched\modules\model_management.py", line 297, in model_load raise e File "C:\Program Files\Fooocus\Fooocus\ldm_patched\modules\model_management.py", line 293, in model_load self.real_model = self.model.patch_model(device_to=patch_model_to) #TODO: do something with loras and offloading to CPU File "C:\Program Files\Fooocus\Fooocus\ldm_patched\modules\model_patcher.py", line 198, in patch_model temp_weight = ldm_patched.modules.model_management.cast_to_device(weight, device_to, torch.float32, copy=True) File "C:\Program Files\Fooocus\Fooocus\ldm_patched\modules\model_management.py", line 587, in cast_to_device return tensor.to(device, copy=copy, non_blocking=non_blocking).to(dtype, non_blocking=non_blocking) RuntimeError: Could not allocate tensor with 52428800 bytes. There is not enough GPU video memory available! Total time: 13.93 seconds

patientx commented 8 months ago

strangely while I am not getting any memory errors (8 GB 6600) my friend does , (16 GB 6800XT). Both of us also have fast nvme drives as swap drives. And 16 GB system memory.

Lysuwel commented 3 months ago

It maybe use memory,not GPU's memory,I had add some ‘Virtual memory’ swapfile ,and then it's work!

gasperpb commented 2 weeks ago

python main.py --directml --use-split-cross-attention --lowvram