Upcoming issue, in the next 1.3.0 webui update Tiled VAE we'll have issues working with "token merging"

in the upcoming 1.3 webui update we will introduced a future called "token merging" https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/9256 this feature is currently available in the dev branch

tldr Tiled VAE won't work when token merging is enabled

during testing of dev branch I found that when token merging is enabled Tiled VAE seems to be not functionally probably and not reducing vram

this is just a preliminary issue report I haven't dug into the issue yet and potential solution just want to notify you that the issue will exist in the updcooming update

when trying to generate larger images that need Tiled VAE to not out of memory, if "token merging" is enabled will result in out of memory, as if it is not tiled.

Tiled VAE will still function as normal if "token merging" is disabled

not sure what it would take to solve the issue, maybe it requires changes on webui

the following out of memory error log was encountered when trying to generate 3840x2400 image on 3090 24G with token merging enabled, (also xformers) the same issue is also reproduced on a 1650 4G when generate 2048x2048 image with tiled vae and token merging enabled at the same time

100%|██████████████████████████████████████████████████████████████████████████████████| 27/27 [00:04<00:00,  5.43it/s]
[Tiled VAE]: the input size is tiny and unnecessary to tile.                           | 27/57 [00:29<00:03,  7.52it/s]
        Tile 1/24
        Tile 2/24
        Tile 3/24
        Tile 4/24
        Tile 5/24
        Tile 6/24
        Tile 7/24
        Tile 8/24
        Tile 9/24
        Tile 10/24
        Tile 11/24
        Tile 12/24
        Tile 13/24
        Tile 14/24
        Tile 15/24
        Tile 16/24
        Tile 17/24
        Tile 18/24
        Tile 19/24
        Tile 20/24
        Tile 21/24
        Tile 22/24
        Tile 23/24
        Tile 24/24
[Tiled VAE]: input_size: torch.Size([1, 3, 2400, 3840]), tile_size: 3072, padding: 32
[Tiled VAE]: split to 1x2 = 2 tiles. Optimal tile size 1888x2336, original tile size 3072x3072
[Tiled VAE]: Fast mode enabled, estimating group norm parameters on 3072 x 1920 image
[Tiled VAE]: Executing Encoder Task Queue: 100%|████████████████████████████████████| 182/182 [00:01<00:00, 101.19it/s]
[Tiled VAE]: Done in 4.088s, max VRAM alloc 9625.963 MB
[Tiled VAE]: the input size is tiny and unnecessary to tile.
  0%|                                                                                           | 0/30 [00:01<?, ?it/s]
Error completing request
Arguments: ('task(won8jkm933f00m6)', 'masterpiece, best quality,(realistic),(finely detailed beautiful eyes and detailed face),cinematic lighting,bust shot,extremely detailed CG unity 8k wallpaper,solo,(beautiful detailed eyes),1girl,meteor,large breasts,detailed and beautiful river scenery,shooting star,firefly,(colorful flower field, colorful Dreamy forest),aurora,((sparkling water)),Isekai cityscape,smile,sky, starry sky,moonlight,moon,night,(dark theme:1.3), light, (yowane haku:1.2), white hair,(long hair:1.1),((purple headphones)),((white sailor outfit)), ((short skirt)), ((purple tie)), ((knee-high boots)),mystical and mysterious atmosphere', 'sketch, duplicate, ugly, huge eyes, text, logo, monochrome, worst face, (bad and mutated hands:1.3), (worst quality:2.0), (low quality:2.0), (blurry:2.0), horror, geometry, bad_prompt, (bad hands), (missing fingers), multiple limbs, bad anatomy, (interlocked fingers:1.2), Ugly Fingers, (extra digit and hands and fingers and legs and arms:1.4), ((2girl)), (deformed fingers:1.2), (long fingers:1.2),(bad-artist-anime), bad-artist, bad hand, extra legs', [], 27, 15, False, False, 1, 1, 10, 277690920.0, -1.0, 0, 0, 0, False, 600, 960, True, 0.5, 4, 'R-ESRGAN 4x+ Anime6B', 30, 0, 0, ['Clip skip: 2', 'Model hash: 099e07547a'], 0, False, 'MultiDiffusion', False, True, 1024, 1024, 96, 96, 48, 1, 'None', 2, False, 10, 1, 1, 64, False, False, False, False, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, True, 3072, 192, True, True, True, False, <controlnet.py.UiControlNetUnit object at 0x0000020F08CAF820>, <controlnet.py.UiControlNetUnit object at 0x0000020F08CAE530>, False, False, 'positive', 'comma', 0, False, False, '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0, False, True, False, True, True, 'Create in UI', False, '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', None, False, None, False, 50) {}
Traceback (most recent call last):
  File "B:\GitHub\stable-diffusion-webui\modules\call_queue.py", line 57, in f
    res = list(func(*args, **kwargs))
  File "B:\GitHub\stable-diffusion-webui\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "B:\GitHub\stable-diffusion-webui\modules\txt2img.py", line 53, in txt2img
    processed = processing.process_images(p)
  File "B:\GitHub\stable-diffusion-webui\modules\processing.py", line 551, in process_images
    res = process_images_inner(p)
  File "B:\GitHub\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 42, in processing_process_images_hijack
    return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
  File "B:\GitHub\stable-diffusion-webui\modules\processing.py", line 703, in process_images_inner
    samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
  File "B:\GitHub\stable-diffusion-webui\modules\processing.py", line 1006, in sample
    samples = self.sampler.sample_img2img(self, samples, noise, conditioning, unconditional_conditioning, steps=self.hr_second_pass_steps or self.steps, image_conditioning=image_conditioning)
  File "B:\GitHub\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 354, in sample_img2img
    samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
  File "B:\GitHub\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 255, in launch_sampling
    return func()
  File "B:\GitHub\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 354, in <lambda>
    samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
  File "B:\GitHub\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "B:\GitHub\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py", line 594, in sample_dpmpp_2m
    denoised = model(x, sigmas[i] * s_in, **extra_args)
  File "B:\GitHub\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "B:\GitHub\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 135, in forward
    x_out = self.inner_model(x_in, sigma_in, cond=make_condition_dict([cond_in], image_cond_in))
  File "B:\GitHub\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "B:\GitHub\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 112, in forward
    eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
  File "B:\GitHub\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 138, in get_eps
    return self.inner_model.apply_model(*args, **kwargs)
  File "B:\GitHub\stable-diffusion-webui\modules\sd_hijack_utils.py", line 17, in <lambda>
    setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
  File "B:\GitHub\stable-diffusion-webui\modules\sd_hijack_utils.py", line 28, in __call__
    return self.__orig_func(*args, **kwargs)
  File "B:\GitHub\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 858, in apply_model
    x_recon = self.model(x_noisy, t, **cond)
  File "B:\GitHub\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "B:\GitHub\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 1335, in forward
    out = self.diffusion_model(x, t, context=cc)
  File "B:\GitHub\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "B:\GitHub\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 797, in forward
    h = module(h, emb, context)
  File "B:\GitHub\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "B:\GitHub\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 84, in forward
    x = layer(x, context)
  File "B:\GitHub\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "B:\GitHub\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 334, in forward
    x = block(x, context=context[i])
  File "B:\GitHub\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "B:\GitHub\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 269, in forward
    return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint)
  File "B:\GitHub\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\util.py", line 121, in checkpoint
    return CheckpointFunction.apply(func, len(inputs), *args)
  File "B:\GitHub\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "B:\GitHub\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\util.py", line 136, in forward
    output_tensors = ctx.run_function(*ctx.input_tensors)
  File "B:\GitHub\stable-diffusion-webui\venv\lib\site-packages\tomesd\patch.py", line 51, in _forward
    m_a, m_c, m_m, u_a, u_c, u_m = compute_merge(x, self._tome_info)
  File "B:\GitHub\stable-diffusion-webui\venv\lib\site-packages\tomesd\patch.py", line 24, in compute_merge
    m, u = merge.bipartite_soft_matching_random2d(x, w, h, args["sx"], args["sy"], r, not use_rand)
  File "B:\GitHub\stable-diffusion-webui\venv\lib\site-packages\tomesd\merge.py", line 85, in bipartite_soft_matching_random2d
    scores = a @ b.transpose(-1, -2)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 14.48 GiB (GPU 0; 24.00 GiB total capacity; 3.92 GiB already allocated; 3.34 GiB free; 18.31 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

pkuliyi2015 / multidiffusion-upscaler-for-automatic1111

Upcoming issue, in the next 1.3.0 webui update Tiled VAE we'll have issues working with "token merging" #204