lllyasviel / Fooocus

Focus on prompting and generating
GNU General Public License v3.0
41.77k stars 5.97k forks source link

AMD 6700XT 12GB DML allocator out of memory. #835

Open uzior opened 1 year ago

uzior commented 1 year ago

Hi! I realize that AMD devices are still in the beta phase, but the problem I encountered seems relatively easy to solve, namely:

Everything starts correctly, the generation process starts, but when it allocates VRAM memory after exceeding 7GB, the system crashes.

Any suggestions? I will be grateful for all suggestions. log below.

Already up-to-date Update succeeded. Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)] Fooocus version: 2.1.771 Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). Using directml with device: Total VRAM 1024 MB, total RAM 32694 MB Set vram state to: NORMAL_VRAM Device: privateuseone VAE dtype: torch.float32 Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --use-split-cross-attention [Fooocus] Disabling smart memory Refiner unloaded. model_type EPS adm 2816 Using split attention in VAE Working with z of shape (1, 4, 32, 32) = 4096 dimensions. Using split attention in VAE missing {'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection'} Base model loaded: E:\VM\AI\Fooocus\models\checkpoints\juggernautXL_version6Rundiffusion.safetensors LoRAs loaded: [('sd_xl_offset_example-lora_1.0.safetensors', 0.1), ('None', 0.1), ('None', 0.1), ('None', 0.1), ('None', 0.1)] Fooocus Expansion engine loaded for cpu, use_fp16 = False. Requested to load SDXLClipModel Requested to load GPT2LMHeadModel Loading 2 new models App started successful. Use the app with http://127.0.0.1:7860/ or 127.0.0.1:7860 [Parameters] Adaptive CFG = 7 [Parameters] Sharpness = 1 [Parameters] ADM Scale = 1.5 : 0.8 : 0.3 [Parameters] CFG = 4.0 [Parameters] Seed = 139978393905596030 [Parameters] Sampler = dpmpp_2m_sde_gpu - karras [Parameters] Steps = 30 - 24 [Fooocus] Initializing ... [Fooocus] Loading models ... Refiner unloaded. [Fooocus] Processing prompts ... [Fooocus] Encoding positive #1 ... [Fooocus] Encoding positive #2 ... [Fooocus] Encoding negative #1 ... [Fooocus] Encoding negative #2 ... Preparation time: 3.72 seconds [Sampler] refiner_swap_method = joint [Sampler] sigma_min = 0.02916753850877285, sigma_max = 14.614643096923828 Requested to load SDXL Loading 1 new model [W D:\a_work\1\s\pytorch-directml-plugin\torch_directml\csrc\engine\dml_heap_allocator.cc:120] DML allocator out of memory! [W D:\a_work\1\s\pytorch-directml-plugin\torch_directml\csrc\engine\dml_heap_allocator.cc:120] DML allocator out of memory!

f-klement commented 1 year ago

I have a similar problem on my RX 6800S, the laptop runs out of memory and crashes at every run

lukechar commented 1 year ago

Same issue on my RX 6650 XT - "DML allocator out of memory!"

PowerZones commented 12 months ago

Same on RX580 8gb

jhoyocartes commented 11 months ago

Same on RX580 8gb too

quoije commented 11 months ago

Same issue with AMD 6700XT.

EDIT: Not anymore, fix my issue by setting the page file to automatic and insure that I have space available on my disk. The same gpu was coincidental and probably not related to OP issue.

Windows 11

Crunch91 commented 11 months ago

Same here. My specs:

6700 XT (12gb) 16GB RAM 24576 MB swap file SSD Windows

DML allocator out of memory!
ptrkrnstnr commented 10 months ago

same here... 7900xtx (24GB), 7800x3d, 32 GB RAM, 26624 MB swapfile, SSD, Windows 10

TobiWan-Kenobi commented 10 months ago

Same issue here with Intel(R) UHD Graphics GPU 8GB ... Will I have any chance of running this with this kind of GPU at all? (Windows 11)

OdinM13 commented 10 months ago

I have the same issue while having 32gb ram and radeon rx6800xt. The strange thing is, that it worked properly before. For a few days I could generate as many pictures as I desired with all kinds of different settings, but now this is no longer possbile and I don't know why

ptrkrnstnr commented 10 months ago

I have a solution: go back to version 2.1.851, modify the "run.bat" to disable all updates like so:

:: .\python_embeded\python.exe -m pip uninstall torch torchvision torchaudio torchtext functorch xformers -y :: .\python_embeded\python.exe -m pip install torch-directml .\python_embeded\python.exe -s Fooocus\launch.py --directml pause

you can download the files of v 2.1.851 by selecting the right branch and just copy the files over a fresh installation.

lgwjames commented 10 months ago

I have a solution: go back to version 2.1.851, modify the "run.bat" to disable all updates like so:

:: .\python_embeded\python.exe -m pip uninstall torch torchvision torchaudio torchtext functorch xformers -y :: .\python_embeded\python.exe -m pip install torch-directml .\python_embeded\python.exe -s Fooocus\launch.py --directml pause

you can download the files of v 2.1.851 by selecting the right branch and just copy the files over a fresh installation.

Tried this today as installed on bootcamp with rx580 8gb

get this

Traceback (most recent call last): File "C:\Program Files\Fooocus\Fooocus\modules\async_worker.py", line 803, in worker handler(task) File "C:\Program Files\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "C:\Program Files\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "C:\Program Files\Fooocus\Fooocus\modules\async_worker.py", line 735, in handler imgs = pipeline.process_diffusion( File "C:\Program Files\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "C:\Program Files\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "C:\Program Files\Fooocus\Fooocus\modules\default_pipeline.py", line 361, in process_diffusion sampled_latent = core.ksampler( File "C:\Program Files\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "C:\Program Files\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "C:\Program Files\Fooocus\Fooocus\modules\core.py", line 313, in ksampler samples = ldm_patched.modules.sample.sample(model, File "C:\Program Files\Fooocus\Fooocus\ldm_patched\modules\sample.py", line 93, in sample real_model, positive_copy, negative_copy, noise_mask, models = prepare_sampling(model, noise.shape, positive, negative, noise_mask) File "C:\Program Files\Fooocus\Fooocus\ldm_patched\modules\sample.py", line 86, in prepare_sampling ldm_patched.modules.model_management.load_models_gpu([model] + models, model.memory_required([noise_shape[0] 2] + list(noise_shape[1:])) + inference_memory) File "C:\Program Files\Fooocus\Fooocus\modules\patch.py", line 441, in patched_load_models_gpu y = ldm_patched.modules.model_management.load_models_gpu_origin(args, **kwargs) File "C:\Program Files\Fooocus\Fooocus\ldm_patched\modules\model_management.py", line 414, in load_models_gpu cur_loaded_model = loaded_model.model_load(lowvram_model_memory) File "C:\Program Files\Fooocus\Fooocus\ldm_patched\modules\model_management.py", line 297, in model_load raise e File "C:\Program Files\Fooocus\Fooocus\ldm_patched\modules\model_management.py", line 293, in model_load self.real_model = self.model.patch_model(device_to=patch_model_to) #TODO: do something with loras and offloading to CPU File "C:\Program Files\Fooocus\Fooocus\ldm_patched\modules\model_patcher.py", line 198, in patch_model temp_weight = ldm_patched.modules.model_management.cast_to_device(weight, device_to, torch.float32, copy=True) File "C:\Program Files\Fooocus\Fooocus\ldm_patched\modules\model_management.py", line 587, in cast_to_device return tensor.to(device, copy=copy, non_blocking=non_blocking).to(dtype, non_blocking=non_blocking) RuntimeError: Could not allocate tensor with 52428800 bytes. There is not enough GPU video memory available! Total time: 13.93 seconds

patientx commented 10 months ago

strangely while I am not getting any memory errors (8 GB 6600) my friend does , (16 GB 6800XT). Both of us also have fast nvme drives as swap drives. And 16 GB system memory.

Lysuwel commented 5 months ago

It maybe use memory,not GPU's memory,I had add some ‘Virtual memory’ swapfile ,and then it's work!

gasperpb commented 2 months ago

python main.py --directml --use-split-cross-attention --lowvram