[Bug]: Not enough Vram on AMD GPU/CPUs

Issue Description

SDnext indicates the console does not have enough vram to create images above 600x600pix on AMD GPU/CPUs
Version Platform Description

Current branch: master Current version: 2024-07-10 hash 2ec6e9ee Latest version: 2024-07-10 hash 2ec6e9ee
CPU: AMD Ryzen 7 7800X3D GPU: AMD Radeon 7900 GRE OC RAM: 32GB 6400HZ
Relevant log output

14:12:24-900378 INFO     Load model: time=1.43 load=1.43 native=512 {'ram': {'used': 12.0, 'total': 31.19}, 'gpu':
                         {'used': 0.0, 'total': 0.5}, 'retries': 0, 'oom': 0}
14:12:40-641607 INFO     Settings: changed=1 ['cross_attention_optimization']
14:12:41-164924 WARNING  Server shutdown requested
14:12:42-302059 INFO     Server restarting...
14:12:42-486390 INFO     Server will restart
14:12:45-499045 INFO     Command line args: ['--use-directml', '--medvram', '--autolaunch'] medvram=True autolaunch=True
                         use_directml=True
14:12:45-503050 INFO     Available VAEs: path="models\VAE" items=1
14:12:45-504419 INFO     Disabled extensions: ['sdnext-modernui']
14:12:45-507416 INFO     Available models: path="models\Stable-diffusion" items=34 time=0.00
14:12:45-565671 INFO     UI theme: type=Standard name="black-teal"
14:12:46-511054 INFO     Local URL: http://127.0.0.1:7860/
14:12:46-786463 INFO     [AgentScheduler] Runner is paused
14:12:46-787463 INFO     [AgentScheduler] Registering APIs
14:12:46-856108 INFO     Torch override dtype: no-half set
14:12:46-857107 INFO     Setting Torch parameters: device=privateuseone:0 dtype=torch.float32 vae=torch.float32
                         unet=torch.float32 context=no_grad fp16=True bf16=None optimization=Dynamic Attention BMM
14:12:46-860107 INFO     Startup time: 36.84 ldm=35.48 ui-en=0.14 ui-control=0.07 ui-settings=0.14 ui-extensions=0.32
                         launch=0.08 app-started=0.34
14:12:49-906350 INFO     MOTD: N/A
14:12:53-768824 INFO     Browser session: user=None client=127.0.0.1 agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64)
                         AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36 OPR/111.0.0.0
14:15:37-007318 INFO     Version: {'url': 'https://github.com/vladmandic/automatic/tree/master', 'branch': 'master',
                         'current': '2024-07-10', 'chash': '2ec6e9ee', 'latest': '2024-07-10', 'lhash': '2ec6e9ee'}
14:17:42-830467 INFO     Setting attention optimization: Dynamic Attention BMM
14:17:42-836977 INFO     Base: class=StableDiffusionPipeline
Progress ?it/s                                              0% 0/20 00:00 ? Base
14:17:44-007351 ERROR    Processing: args={'prompt_embeds': tensor([[[-0.3958,  0.0175, -0.0418,  ..., -0.4832, -0.3126,
                         0.0646],
                                  [-0.1104,  0.6700,  0.6209,  ..., -2.6800,  0.2773, -0.7444],
                                  [-0.3502, -0.3011, -0.6404,  ...,  0.2915,  0.7310, -1.5189],
                                  ...,
                                  [ 0.7985, -0.7907,  0.2690,  ...,  0.3522, -1.3319,  0.1053],
                                  [ 0.8099, -0.7749,  0.2631,  ...,  0.3633, -1.3427,  0.0957],
                                  [ 0.7886, -0.7731,  0.3445,  ...,  0.3266, -1.3078,  0.0802]]],
                                device='privateuseone:0'), 'negative_prompt_embeds': tensor([[[-3.9577e-01,  1.7513e-02,
                         -4.1837e-02,  ..., -4.8320e-01,
                                   -3.1263e-01,  6.4627e-02],
                                  [-5.7163e-01, -1.8411e+00, -2.8816e-01,  ...,  9.0727e-01,
                                    2.0156e-01, -8.0214e-01],
                                  [-5.3836e-01, -1.8736e+00, -1.6672e-01,  ...,  9.4559e-01,
                                   -1.3216e-03, -6.9389e-01],
                                  ...,
                                  [ 1.1004e+00, -7.5024e-01, -5.1821e-01,  ...,  9.3922e-02,
                                   -1.8171e+00,  5.2286e-01],
                                  [ 1.1060e+00, -7.4373e-01, -5.0642e-01,  ...,  1.1394e-01,
                                   -1.8149e+00,  5.0341e-01],
                                  [ 1.1290e+00, -7.3774e-01, -4.2811e-01,  ...,  3.4663e-02,
                                   -1.8052e+00,  4.7349e-01]]], device='privateuseone:0'), 'guidance_scale': 6,
                         'generator': [<torch._C.Generator object at 0x000001E78B92FB90>], 'callback_on_step_end':
                         <function diffusers_callback at 0x000001E78C760C10>, 'callback_on_step_end_tensor_inputs':
                         ['latents', 'prompt_embeds', 'negative_prompt_embeds'], 'num_inference_steps': 20, 'eta': 1.0,
                         'guidance_rescale': 0.7, 'output_type': 'latent', 'width': 1048, 'height': 1048} setStorage:
                         sizes [4, 17161, 17161], strides [294499921, 17161, 1], storage offset 0, and itemsize 4
                         requiring a storage size of 4711998736 are out of bounds for storage of size 417031440
14:17:44-015659 ERROR    Processing: RuntimeError
╭───────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────╮
│ E:\SD-NEXT\automatic\modules\processing_diffusers.py:122 in process_diffusers                                        │
│                                                                                                                      │
│   121 │   │   else:                                                                                                  │
│ ❱ 122 │   │   │   output = shared.sd_model(**base_args)                                                              │
│   123 │   │   if isinstance(output, dict):                                                                           │
│                                                                                                                      │
│ E:\SD-NEXT\automatic\venv\lib\site-packages\torch\utils\_contextlib.py:115 in decorate_context                       │
│                                                                                                                      │
│   114 │   │   with ctx_factory():                                                                                    │
│ ❱ 115 │   │   │   return func(*args, **kwargs)                                                                       │
│   116                                                                                                                │
│                                                                                                                      │
│ E:\SD-NEXT\automatic\venv\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_stable_diffusion.py:1006 i │
│                                                                                                                      │
│   1005 │   │   │   │   # predict the noise residual                                                                  │
│ ❱ 1006 │   │   │   │   noise_pred = self.unet(                                                                       │
│   1007 │   │   │   │   │   latent_model_input,                                                                       │
│                                                                                                                      │
│ E:\SD-NEXT\automatic\venv\lib\site-packages\torch\nn\modules\module.py:1532 in _wrapped_call_impl                    │
│                                                                                                                      │
│   1531 │   │   else:                                                                                                 │
│ ❱ 1532 │   │   │   return self._call_impl(*args, **kwargs)                                                           │
│   1533                                                                                                               │
│                                                                                                                      │
│ E:\SD-NEXT\automatic\venv\lib\site-packages\torch\nn\modules\module.py:1541 in _call_impl                            │
│                                                                                                                      │
│   1540 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                                       │
│ ❱ 1541 │   │   │   return forward_call(*args, **kwargs)                                                              │
│   1542                                                                                                               │
│                                                                                                                      │
│                                               ... 12 frames hidden ...                                               │
│                                                                                                                      │
│ E:\SD-NEXT\automatic\venv\lib\site-packages\diffusers\models\attention_processor.py:559 in forward                   │
│                                                                                                                      │
│    558 │   │                                                                                                         │
│ ❱  559 │   │   return self.processor(                                                                                │
│    560 │   │   │   self,                                                                                             │
│                                                                                                                      │
│ E:\SD-NEXT\automatic\modules\sd_hijack_dynamic_atten.py:277 in __call__                                              │
│                                                                                                                      │
│   276 │   │   │   │   │                                                                                              │
│ ❱ 277 │   │   │   │   │   attn_slice = attn.get_attention_scores(query_slice, key_slice, attn_mask_slice)            │
│   278 │   │   │   │   │   del query_slice                                                                            │
│                                                                                                                      │
│ E:\SD-NEXT\automatic\venv\lib\site-packages\diffusers\models\attention_processor.py:639 in get_attention_scores      │
│                                                                                                                      │
│    638 │   │                                                                                                         │
│ ❱  639 │   │   attention_scores = torch.baddbmm(                                                                     │
│    640 │   │   │   baddbmm_input,                                                                                    │
│                                                                                                                      │
│ E:\SD-NEXT\automatic\modules\dml\amp\autocast_mode.py:43 in <lambda>                                                 │
│                                                                                                                      │
│   42 │   │   op = getattr(resolved_obj, func_path[-1])                                                               │
│ ❱ 43 │   │   setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: forward(op, args, kwargs))                 │
│   44                                                                                                                 │
│                                                                                                                      │
│ E:\SD-NEXT\automatic\modules\dml\amp\autocast_mode.py:15 in forward                                                  │
│                                                                                                                      │
│   14 │   if not torch.dml.is_autocast_enabled:                                                                       │
│ ❱ 15 │   │   return op(*args, **kwargs)                                                                              │
│   16 │   args = list(map(cast, args))                                                                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: setStorage: sizes [4, 17161, 17161], strides [294499921, 17161, 1], storage offset 0, and itemsize 4 requiring a storage size of 4711998736 are out of bounds for storage of size 417031440
14:17:44-274412 INFO     Processed: images=0 time=4.87 its=0.00 memory={'ram': {'used': 5.95, 'total': 31.19}, 'gpu':
                         {'used': 0.0, 'total': 0.5}, 'retries': 0, 'oom': 0}
Backend

Diffusers
UI

Standard
Branch

Master
Model

Other
Acknowledgements

[X] I have read the above and searched for existing issues
[X] I confirm that this is classified correctly and its not an extension issue
vladmandic / automatic

[Bug]: Not enough Vram on AMD GPU/CPUs #3331