[Bug]: RuntimeError: CUDA error: the launch timed out and was terminated

Checklist

[ ] The issue exists after disabling all extensions
[ ] The issue exists on a clean installation of webui
[ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
[X] The issue exists in the current version of the webui
[X] The issue has not been reported before recently
[ ] The issue has been reported before but has not been fixed yet

What happened?

When I use hires fix,I met RuntimeError: CUDA error: the launch timed out and was terminated.

Steps to reproduce the problem

I use controlnet ( but when before I use controlnet I met same things ). I search for google and add set CUDA_LAUNCH_BLOCKING=1 in bat. This is bat's config:

@echo off

call activate sd-forge

set PYTHON=
set GIT=
set VENV_DIR=
set COMMANDLINE_ARGS=--no-gradio-queue --no-half-vae --xformers
set CUDA_LAUNCH_BLOCKING=1

@REM Uncomment following code to reference an existing A1111 checkout.
 set A1111_HOME=D:/stable-diffusion/stable-diffusion-webui

 set VENV_DIR=%A1111_HOME%/venv
 set COMMANDLINE_ARGS=%COMMANDLINE_ARGS% ^
  --ckpt-dir %A1111_HOME%/models/Stable-diffusion ^
  --hypernetwork-dir %A1111_HOME%/models/hypernetworks ^
  --embeddings-dir %A1111_HOME%/embeddings ^
  --lora-dir %A1111_HOME%/models/Lora^
  --controlnet-dir %A1111_HOME%/extensions/sd-webui-controlnet ^ 
  --gfpgan-models-path %A1111_HOME%/models/GFPGAN ^ 
  --codeformer-models-path %A1111_HOME%/models/CodeFormer ^ 
  --esrgan-models-path %A1111_HOME%/models/ESRGAN ^ 
  --realesrgan-models-path %A1111_HOME%/models/RealESRGAN ^
  --ldsr-models-path %A1111_HOME%/models/LDSR ^
  --swinir-models-path %A1111_HOME%/models/SwinIR ^
  --bsrgan-models-path %A1111_HOME%/models/ESRGAN ^
  --scunet-models-path %A1111_HOME%/models/ScuNET

call webui.bat

What should have happened?

This error should not happen.

What browsers do you use to access the UI ?

firefox

Sysinfo

sysinfo-2024-05-25-13-19.json

Console logs

Cleanup minimal inference memory.
tiled upscale: 100%|███████████████████████████████████████████████████████████████████| 30/30 [01:51<00:00,  3.72s/it]
Memory cleanup has taken 1.98 seconds
Exception in thread MemMon:
Traceback (most recent call last):
  File "C:\ProgramData\anaconda3\Lib\threading.py", line 1038, in _bootstrap_inner
    self.run()
  File "D:\stable-diffusion\forge\stable-diffusion-webui-forge\modules\memmon.py", line 53, in run
    free, total = self.cuda_mem_get_info()
                  ^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\forge\stable-diffusion-webui-forge\modules\memmon.py", line 34, in cuda_mem_get_info
    return torch.cuda.mem_get_info(index)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\stable-diffusion-webui\venv\Lib\site-packages\torch\cuda\memory.py", line 663, in mem_get_info
    return torch.cuda.cudart().cudaMemGetInfo(device)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: the launch timed out and was terminated
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Traceback (most recent call last):
  File "D:\stable-diffusion\forge\stable-diffusion-webui-forge\modules_forge\main_thread.py", line 37, in loop
    task.work()
  File "D:\stable-diffusion\forge\stable-diffusion-webui-forge\modules_forge\main_thread.py", line 26, in work
    self.result = self.func(*self.args, **self.kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\forge\stable-diffusion-webui-forge\modules\txt2img.py", line 111, in txt2img_function
    processed = processing.process_images(p)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\forge\stable-diffusion-webui-forge\modules\processing.py", line 752, in process_images
    res = process_images_inner(p)
          ^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\forge\stable-diffusion-webui-forge\modules\processing.py", line 922, in process_images_inner
    samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\forge\stable-diffusion-webui-forge\modules\processing.py", line 1291, in sample
    return self.sample_hr_pass(samples, decoded_samples, seeds, subseeds, subseed_strength, prompts)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\forge\stable-diffusion-webui-forge\modules\processing.py", line 1350, in sample_hr_pass
    samples = images_tensor_to_samples(decoded_samples, approximation_indexes.get(opts.sd_vae_encode_method))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\forge\stable-diffusion-webui-forge\modules\sd_samplers_common.py", line 107, in images_tensor_to_samples
    x_latent = model.get_first_stage_encoding(model.encode_first_stage(image))
                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\stable-diffusion-webui\venv\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\forge\stable-diffusion-webui-forge\modules_forge\forge_loader.py", line 244, in patched_encode_first_stage
    sample = sd_model.forge_objects.vae.encode(x.movedim(1, -1) * 0.5 + 0.5)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\forge\stable-diffusion-webui-forge\ldm_patched\modules\sd.py", line 320, in encode
    return self.encode_inner(pixel_samples)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\forge\stable-diffusion-webui-forge\ldm_patched\modules\sd.py", line 309, in encode_inner
    samples[x:x+batch_number] = self.first_stage_model.encode(pixels_in).to(self.output_device).float()
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\forge\stable-diffusion-webui-forge\ldm_patched\ldm\models\autoencoder.py", line 188, in encode
    z = self.encoder(x)
        ^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\stable-diffusion-webui\venv\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\stable-diffusion-webui\venv\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\forge\stable-diffusion-webui-forge\ldm_patched\ldm\modules\diffusionmodules\model.py", line 538, in forward
    h = self.mid.attn_1(h)
        ^^^^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\stable-diffusion-webui\venv\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\stable-diffusion-webui\venv\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\forge\stable-diffusion-webui-forge\ldm_patched\ldm\modules\diffusionmodules\model.py", line 294, in forward
    h_ = self.optimized_attention(q, k, v)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\forge\stable-diffusion-webui-forge\ldm_patched\ldm\modules\diffusionmodules\model.py", line 227, in xformers_attention
    out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\stable-diffusion-webui\venv\Lib\site-packages\xformers\ops\fmha\__init__.py", line 223, in memory_efficient_attention
    return _memory_efficient_attention(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\stable-diffusion-webui\venv\Lib\site-packages\xformers\ops\fmha\__init__.py", line 321, in _memory_efficient_attention
    return _memory_efficient_attention_forward(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\stable-diffusion-webui\venv\Lib\site-packages\xformers\ops\fmha\__init__.py", line 341, in _memory_efficient_attention_forward
    out, *_ = op.apply(inp, needs_gradient=False)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\stable-diffusion-webui\venv\Lib\site-packages\xformers\ops\fmha\cutlass.py", line 202, in apply
    return cls.apply_bmhk(inp, needs_gradient=needs_gradient)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\stable-diffusion-webui\venv\Lib\site-packages\xformers\ops\fmha\cutlass.py", line 266, in apply_bmhk
    out, lse, rng_seed, rng_offset = cls.OPERATOR(
                                     ^^^^^^^^^^^^^
  File "D:\stable-diffusion\stable-diffusion-webui\venv\Lib\site-packages\torch\_ops.py", line 692, in __call__
    return self._op(*args, **kwargs or {})
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: the launch timed out and was terminated
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

CUDA error: the launch timed out and was terminated
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

*** Error completing request
*** Arguments: ('task(tgmzs7g676g98nh)', <gradio.routes.Request object at 0x000002CE1BBA4810>, '<lora:_o1:0.4>,1girl,', 'logo,text,blurry,low quality,bad anatomy,sketches,lowres,normal quality,monochrome,grayscale,worstquality,signature,watermark,cropped,bad proportions,out of focus,username,Multiple people,bad body,long body,(fat:1.2),long neck,deformed,mutated,mutation,ugly,disfigured,poorly drawn face,skin blemishes,skin spots,acnes,missing limb,malformed limbs,floating limbs,disconnected limbs,extra limb,extra arms,mutated hands,poorly drawn hands,malformed hands,mutated hands and fingers,bad hands,missing fingers,fused fingers,too many fingers,extra legs,bad feet,cross-eyed,', [], 35, 'DPM++ 3M SDE Karras', 1, 1, 7, 768, 1024, True, 0.5, 2, 'DAT x2', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', '', '', [], 0, False, '', 0.8, -1, False, -1, 0, 0, 0, False, False, {'ad_model': 'face_yolov8n.pt', 'ad_model_classes': '', 'ad_prompt': '', 'ad_negative_prompt': '', 'ad_confidence': 0.3, 'ad_mask_k_largest': 0, 'ad_mask_min_ratio': 0, 'ad_mask_max_ratio': 1, 'ad_x_offset': 0, 'ad_y_offset': 0, 'ad_dilate_erode': 4, 'ad_mask_merge_invert': 'None', 'ad_mask_blur': 4, 'ad_denoising_strength': 0.4, 'ad_inpaint_only_masked': True, 'ad_inpaint_only_masked_padding': 32, 'ad_use_inpaint_width_height': False, 'ad_inpaint_width': 512, 'ad_inpaint_height': 512, 'ad_use_steps': False, 'ad_steps': 28, 'ad_use_cfg_scale': False, 'ad_cfg_scale': 7, 'ad_use_checkpoint': False, 'ad_checkpoint': 'Use same checkpoint', 'ad_use_vae': False, 'ad_vae': 'Use same VAE', 'ad_use_sampler': False, 'ad_sampler': 'DPM++ 2M Karras', 'ad_scheduler': 'Use same scheduler', 'ad_use_noise_multiplier': False, 'ad_noise_multiplier': 1, 'ad_use_clip_skip': False, 'ad_clip_skip': 1, 'ad_restore_face': False, 'ad_controlnet_model': 'None', 'ad_controlnet_module': 'None', 'ad_controlnet_weight': 1, 'ad_controlnet_guidance_start': 0, 'ad_controlnet_guidance_end': 1, 'is_api': ()}, {'ad_model': 'None', 'ad_model_classes': '', 'ad_prompt': '', 'ad_negative_prompt': '', 'ad_confidence': 0.3, 'ad_mask_k_largest': 0, 'ad_mask_min_ratio': 0, 'ad_mask_max_ratio': 1, 'ad_x_offset': 0, 'ad_y_offset': 0, 'ad_dilate_erode': 4, 'ad_mask_merge_invert': 'None', 'ad_mask_blur': 4, 'ad_denoising_strength': 0.4, 'ad_inpaint_only_masked': True, 'ad_inpaint_only_masked_padding': 32, 'ad_use_inpaint_width_height': False, 'ad_inpaint_width': 512, 'ad_inpaint_height': 512, 'ad_use_steps': False, 'ad_steps': 28, 'ad_use_cfg_scale': False, 'ad_cfg_scale': 7, 'ad_use_checkpoint': False, 'ad_checkpoint': 'Use same checkpoint', 'ad_use_vae': False, 'ad_vae': 'Use same VAE', 'ad_use_sampler': False, 'ad_sampler': 'DPM++ 2M Karras', 'ad_scheduler': 'Use same scheduler', 'ad_use_noise_multiplier': False, 'ad_noise_multiplier': 1, 'ad_use_clip_skip': False, 'ad_clip_skip': 1, 'ad_restore_face': False, 'ad_controlnet_model': 'None', 'ad_controlnet_module': 'None', 'ad_controlnet_weight': 1, 'ad_controlnet_guidance_start': 0, 'ad_controlnet_guidance_end': 1, 'is_api': ()}, False, 1.6, 0.97, 0.4, 0, 20, 0, 12, '', True, False, False, False, 512, False, True, ['Face'], False, '{\n    "face_detector": "RetinaFace",\n    "rules": {\n        "then": {\n            "face_processor": "img2img",\n            "mask_generator": {\n                "name": "BiSeNet",\n                "params": {\n                    "fallback_ratio": 0.1\n                }\n            }\n        }\n    }\n}', 'None', 40, <scripts.animatediff_ui.AnimateDiffProcess object at 0x000002CE1A63C690>, False, 'None', 20, False, False, 0, None, [], 0, False, [], [], False, 0, 1, False, False, 0, None, [], -2, False, [], False, 0, None, None, False, 'After applying other prompt processings', -1.0, 'long', '', '<|special|>, \n<|characters|>, <|copyrights|>, \n<|artist|>, \n\n<|general|>, \n\n<|quality|>, <|meta|>, <|rating|>', 1.35, 0.95, 100, 'KBlueLeaf/DanTagGen-delta-rev2', False, False, ControlNetUnit(input_mode=<InputMode.SIMPLE: 'simple'>, use_preview_as_input=False, batch_image_dir='', batch_mask_dir='', batch_input_gallery=[], batch_mask_gallery=[], generated_image=array([[[0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0],
***         ...,
***         [0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0]],
***
***        [[0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0],
***         ...,
***         [0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0]],
***
***        [[0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0],
***         ...,
***         [0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0]],
***
***        ...,
***
***        [[0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0],
***         ...,
***         [0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0]],
***
***        [[0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0],
***         ...,
***         [0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0]],
***
***        [[0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0],
***         ...,
***         [0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0]]], dtype=uint8), mask_image=None, hr_option='Both', enabled=True, module='openpose_full', model='kohya_controllllite_xl_openpose_anime [7e5349e5]', weight=1, image={'image': array([[[228, 138, 154],
***         [228, 138, 154],
***         [228, 138, 154],
***         ...,
***         [210, 207, 211],
***         [210, 208, 211],
***         [210, 208, 211]],
***
***        [[228, 138, 154],
***         [228, 138, 154],
***         [228, 138, 154],
***         ...,
***         [210, 208, 211],
***         [210, 208, 211],
***         [210, 208, 211]],
***
***        [[228, 138, 154],
***         [228, 138, 154],
***         [228, 138, 154],
***         ...,
***         [211, 208, 211],
***         [211, 208, 211],
***         [210, 208, 211]],
***
***        ...,
***
***        [[151, 136, 143],
***         [151, 136, 143],
***         [152, 136, 143],
***         ...,
***         [ 86,  72,  82],
***         [ 86,  72,  82],
***         [ 86,  72,  82]],
***
***        [[152, 136, 143],
***         [152, 136, 143],
***         [152, 136, 143],
***         ...,
***         [ 86,  72,  82],
***         [ 86,  72,  82],
***         [ 86,  72,  82]],
***
***        [[152, 136, 143],
***         [152, 136, 143],
***         [152, 136, 143],
***         ...,
***         [ 86,  72,  82],
***         [ 86,  72,  82],
***         [ 86,  72,  82]]], dtype=uint8), 'mask': array([[[0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0],
***         ...,
***         [0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0]],
***
***        [[0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0],
***         ...,
***         [0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0]],
***
***        [[0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0],
***         ...,
***         [0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0]],
***
***        ...,
***
***        [[0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0],
***         ...,
***         [0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0]],
***
***        [[0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0],
***         ...,
***         [0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0]],
***
***        [[0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0],
***         ...,
***         [0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0]]], dtype=uint8)}, resize_mode='Crop and Resize', processor_res=512, threshold_a=0.5, threshold_b=0.5, guidance_start=0, guidance_end=1, pixel_perfect=False, control_mode='Balanced', save_detected_map=True), ControlNetUnit(input_mode=<InputMode.SIMPLE: 'simple'>, use_preview_as_input=False, batch_image_dir='', batch_mask_dir='', batch_input_gallery=None, batch_mask_gallery=None, generated_image=None, mask_image=None, hr_option=<HiResFixOption.BOTH: 'Both'>, enabled=False, module='None', model='None', weight=1.0, image=None, resize_mode=<ResizeMode.INNER_FIT: 'Crop and Resize'>, processor_res=-1, threshold_a=-1, threshold_b=-1, guidance_start=0.0, guidance_end=1.0, pixel_perfect=False, control_mode=<ControlMode.BALANCED: 'Balanced'>, save_detected_map=True), ControlNetUnit(input_mode=<InputMode.SIMPLE: 'simple'>, use_preview_as_input=False, batch_image_dir='', batch_mask_dir='', batch_input_gallery=None, batch_mask_gallery=None, generated_image=None, mask_image=None, hr_option=<HiResFixOption.BOTH: 'Both'>, enabled=False, module='None', model='None', weight=1.0, image=None, resize_mode=<ResizeMode.INNER_FIT: 'Crop and Resize'>, processor_res=-1, threshold_a=-1, threshold_b=-1, guidance_start=0.0, guidance_end=1.0, pixel_perfect=False, control_mode=<ControlMode.BALANCED: 'Balanced'>, save_detected_map=True), False, 7, 1, 'Constant', 0, 'Constant', 0, 1, 'enable', 'MEAN', 'AD', 1, False, 1.01, 1.02, 0.99, 0.95, False, 0.5, 2, False, 256, 2, 0, False, False, 3, 2, 0, 0.35, True, 'bicubic', 'bicubic', False, 0, 'anisotropic', 0, 'reinhard', 100, 0, 'subtract', 0, 0, 'gaussian', 'add', 0, 100, 127, 0, 'hard_clamp', 5, 0, 'None', 'None', False, 'MultiDiffusion', 768, 768, 64, 4, False, False, False, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False, 1.6, 0.97, 0.4, 0, 20, 0, 12, '', True, False, False, False, 512, False, True, ['Face'], False, '{\n    "face_detector": "RetinaFace",\n    "rules": {\n        "then": {\n            "face_processor": "img2img",\n            "mask_generator": {\n                "name": "BiSeNet",\n                "params": {\n                    "fallback_ratio": 0.1\n                }\n            }\n        }\n    }\n}', 'None', 40) {}
    Traceback (most recent call last):
      File "D:\stable-diffusion\forge\stable-diffusion-webui-forge\modules\call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
                   ^^^^^^^^^^^^^^^^^^^^^
      File "D:\stable-diffusion\forge\stable-diffusion-webui-forge\modules\call_queue.py", line 41, in f
        shared.state.end()
      File "D:\stable-diffusion\forge\stable-diffusion-webui-forge\modules\shared_state.py", line 137, in end
        devices.torch_gc()
      File "D:\stable-diffusion\forge\stable-diffusion-webui-forge\modules\devices.py", line 39, in torch_gc
        model_management.soft_empty_cache()
      File "D:\stable-diffusion\forge\stable-diffusion-webui-forge\ldm_patched\modules\model_management.py", line 832, in soft_empty_cache
        torch.cuda.empty_cache()
      File "D:\stable-diffusion\stable-diffusion-webui\venv\Lib\site-packages\torch\cuda\memory.py", line 159, in empty_cache
        torch._C._cuda_emptyCache()
    RuntimeError: CUDA error: the launch timed out and was terminated
    Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

---
Traceback (most recent call last):
  File "D:\stable-diffusion\stable-diffusion-webui\venv\Lib\site-packages\gradio\routes.py", line 488, in run_predict
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\stable-diffusion-webui\venv\Lib\site-packages\gradio\blocks.py", line 1431, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\stable-diffusion-webui\venv\Lib\site-packages\gradio\blocks.py", line 1103, in call_function
    prediction = await anyio.to_thread.run_sync(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\stable-diffusion-webui\venv\Lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\stable-diffusion-webui\venv\Lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "D:\stable-diffusion\stable-diffusion-webui\venv\Lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\stable-diffusion-webui\venv\Lib\site-packages\gradio\utils.py", line 707, in wrapper
    response = f(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^
  File "D:\stable-diffusion\forge\stable-diffusion-webui-forge\modules\call_queue.py", line 77, in f
    devices.torch_gc()
  File "D:\stable-diffusion\forge\stable-diffusion-webui-forge\modules\devices.py", line 39, in torch_gc
    model_management.soft_empty_cache()
  File "D:\stable-diffusion\forge\stable-diffusion-webui-forge\ldm_patched\modules\model_management.py", line 832, in soft_empty_cache
    torch.cuda.empty_cache()
  File "D:\stable-diffusion\stable-diffusion-webui\venv\Lib\site-packages\torch\cuda\memory.py", line 159, in empty_cache
    torch._C._cuda_emptyCache()
RuntimeError: CUDA error: the launch timed out and was terminated
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.



### Additional information

GPU info:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 537.13                 Driver Version: 537.13       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla M40 24GB               WDDM  | 00000000:02:00.0 Off |                    0 |
| N/A   31C    P8              27W / 250W |   5265MiB / 23040MiB |      5%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

lllyasviel / stable-diffusion-webui-forge

[Bug]: RuntimeError: CUDA error: the launch timed out and was terminated #763