[bug]: Crash doing `.swap` when near VRAM limit

invoke-ai / InvokeAI

InvokeAI is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, supports terminal use through a CLI, and serves as the foundation for multiple commercial products.

Apache License 2.0

22.86k stars 2.37k forks source link

invoke> "headshot portrait of a baker wearing (a shirt).swap(an apron), insane quality, intricate, detailed, micro details, three-point warm volumetric lighting, hyperrealism photograph, vibrant color [border, frame, watermark, signature, text, border, framed, drawing, painting, sketch, rendering, bad teeth, fake eye, mutated, deformed, abnormal, asymmetrical, Pixar, collage]" -s 75 -S 2895816 -C 9.0 -I ../images_out/facex14.png -A ddim -f 0.8 -n 10 >> Parsed prompt to FlattenedPrompt:[Fragment:'headshot portrait of a baker wearing'@1.0, CrossAttentionControlSubstitute:([Fragment:'a shirt'@1.0]->[Fragment:'an apron'@1.0] ({'s_start': 0.0, 's_end': 0.2062994740159002, 't_start': 0.0, 't_end': 1.0}), Fragment:', insane quality, intricate, detailed, micro details, three-point warm volumetric lighting, hyperrealism photograph, vibrant color'@1.0] >> loaded input image of size 512x704 from ../images_out/facex14.png Generating: 0%| | 0/10 [00:00<?, ?it/s]>> Running DDIMSampler sampling starting at step 15 of 75 (60 new sampling steps) Decoding image: 0%| | 0/60 [00:00<?, ?it/s] Generating: 0%| | 0/10 [00:00<?, ?it/s] Traceback (most recent call last): File "/home/jovyan/work/InvokeAI/ldm/generate.py", line 459, in prompt2image results = generator.generate( File "/home/jovyan/work/InvokeAI/ldm/invoke/generator/base.py", line 90, in generate image = make_image(x_T) File "/home/jovyan/work/InvokeAI/ldm/invoke/generator/img2img.py", line 52, in make_image samples = sampler.decode( File "/home/jovyan/.conda/envs/invokeai/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/jovyan/work/InvokeAI/ldm/models/diffusion/sampler.py", line 365, in decode outs = self.p_sample( File "/home/jovyan/.conda/envs/invokeai/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/jovyan/work/InvokeAI/ldm/models/diffusion/ddim.py", line 58, in p_sample e_t = self.invokeai_diffuser.do_diffusion_step( File "/home/jovyan/work/InvokeAI/ldm/models/diffusion/shared_invokeai_diffusion.py", line 86, in do_diffusion_step unconditioned_next_x, conditioned_next_x = self.apply_cross_attention_controlled_conditioning(x, sigma, unconditioning, conditioning, cross_attention_control_types_to_do) File "/home/jovyan/work/InvokeAI/ldm/models/diffusion/shared_invokeai_diffusion.py", line 151, in apply_cross_attention_controlled_conditioning conditioned_next_x = self.model_forward_callback(x, sigma, edited_conditioning) File "/home/jovyan/work/InvokeAI/ldm/models/diffusion/ddim.py", line 13, in <lambda> model_forward_callback = lambda x, sigma, cond: self.model.apply_model(x, sigma, cond)) File "/home/jovyan/work/InvokeAI/ldm/models/diffusion/ddpm.py", line 1441, in apply_model x_recon = self.model(x_noisy, t, **cond) File "/home/jovyan/.conda/envs/invokeai/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "/home/jovyan/work/InvokeAI/ldm/models/diffusion/ddpm.py", line 2167, in forward out = self.diffusion_model(x, t, context=cc) File "/home/jovyan/.conda/envs/invokeai/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "/home/jovyan/work/InvokeAI/ldm/modules/diffusionmodules/openaimodel.py", line 806, in forward h = module(h, emb, context) File "/home/jovyan/.conda/envs/invokeai/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "/home/jovyan/work/InvokeAI/ldm/modules/diffusionmodules/openaimodel.py", line 88, in forward x = layer(x, context) File "/home/jovyan/.conda/envs/invokeai/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "/home/jovyan/work/InvokeAI/ldm/modules/attention.py", line 347, in forward x = block(x, context=context) File "/home/jovyan/.conda/envs/invokeai/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "/home/jovyan/work/InvokeAI/ldm/modules/attention.py", line 297, in forward return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint) File "/home/jovyan/work/InvokeAI/ldm/modules/diffusionmodules/util.py", line 159, in checkpoint return func(*inputs) File "/home/jovyan/work/InvokeAI/ldm/modules/attention.py", line 301, in _forward x += self.attn1(self.norm1(x.clone())) File "/home/jovyan/.conda/envs/invokeai/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "/home/jovyan/work/InvokeAI/ldm/modules/attention.py", line 276, in forward r = self.get_attention_mem_efficient(q, k, v) File "/home/jovyan/work/InvokeAI/ldm/modules/attention.py", line 254, in get_attention_mem_efficient return self.einsum_op_cuda(q, k, v) File "/home/jovyan/work/InvokeAI/ldm/modules/attention.py", line 250, in einsum_op_cuda return self.einsum_op_tensor_mem(q, k, v, mem_free_total / 3.3 / (1 << 20)) File "/home/jovyan/work/InvokeAI/ldm/modules/attention.py", line 239, in einsum_op_tensor_mem return self.einsum_op_slice_dim0(q, k, v, q.shape[0] // div) File "/home/jovyan/work/InvokeAI/ldm/modules/attention.py", line 210, in einsum_op_slice_dim0 r[i:end] = self.einsum_lowest_level(q[i:end], k[i:end], v[i:end], dim=0, offset=i, slice_size=slice_size) File "/home/jovyan/work/InvokeAI/ldm/modules/attention.py", line 204, in einsum_lowest_level return einsum('b i j, b j d -> b i d', attention_slice, v) File "/home/jovyan/.conda/envs/invokeai/lib/python3.9/site-packages/torch/functional.py", line 378, in einsum return _VF.einsum(equation, operands) # type: ignore[attr-defined] RuntimeError: einsum(): the number of subscripts in the equation (3) does not match the number of dimensions (2) for operand 0 and no ellipsis was given >> Could not generate image. >> Usage stats: >> 0 image(s) generated in 1.27s >> Max VRAM used for this generation: 11.33G. Current VRAM utilization: 5.26G >> Max VRAM used since script start: 11.33G Outputs:

thanks @JPPhoto for helping debug this. uncommenting line 144 in cross_attention_control.py gives the following output:

in wrangler with suggested_attention_slice shape torch.Size([8, 5632, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 5632, 5632]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 5632, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 1408, 1408]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 1408, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 1408, 1408]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 1408, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 352, 352]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 352, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 352, 352]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 352, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 88, 88]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 88, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 352, 352]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 352, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 352, 352]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 352, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 352, 352]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 352, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 1408, 1408]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 1408, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 1408, 1408]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 1408, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 1408, 1408]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 1408, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 5632, 5632]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 5632, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 5632, 5632]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 5632, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 5632, 5632]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 5632, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 5632, 5632]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 5632, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 5632, 5632]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 5632, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 1408, 1408]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 1408, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 1408, 1408]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 1408, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 352, 352]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 352, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 352, 352]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 352, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 88, 88]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 88, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 352, 352]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 352, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 352, 352]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 352, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 352, 352]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 352, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 1408, 1408]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 1408, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 1408, 1408]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 1408, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 1408, 1408]) dim None
in wrangler with suggested_attention_slice shape torch.Size([8, 1408, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([4, 5632, 5632]) dim 0
in wrangler with suggested_attention_slice shape torch.Size([4, 5632, 5632]) dim 0
in wrangler with suggested_attention_slice shape torch.Size([8, 5632, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([4, 5632, 5632]) dim 0
in wrangler with suggested_attention_slice shape torch.Size([4, 5632, 5632]) dim 0
in wrangler with suggested_attention_slice shape torch.Size([8, 5632, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([2, 5632, 5632]) dim 0
in wrangler with suggested_attention_slice shape torch.Size([2, 5632, 5632]) dim 0
in wrangler with suggested_attention_slice shape torch.Size([2, 5632, 5632]) dim 0
in wrangler with suggested_attention_slice shape torch.Size([2, 5632, 5632]) dim 0
in wrangler with suggested_attention_slice shape torch.Size([8, 5632, 77]) dim None
in wrangler with suggested_attention_slice shape torch.Size([2, 5632, 5632]) dim 0

before crashing. Note None vs dim 0 - the bug is happening because the cross attention control code is assuming a fixed slicing strategy (None will stay None, dim 0 will stay dim 0), but the slicing strategy is changing dynamically between calls to attention_slice_wrangler meaning that the original attention could have been saved without slicing (None), but then when it comes to apply the edited attention the slicing strategy has become dim 0, leading to mismatches between stored attention slices and expected return value.

invoke-ai / InvokeAI

[bug]: Crash doing `.swap` when near VRAM limit #1362

Is there an existing issue for this?

OS

GPU

VRAM

What happened?

Screenshots

Additional context

Contact Details