CUDA error: the launch timed out and was terminated

Is this a driver issue or a uncaught exception?

// Benchmarks to here goes fine..
run benchmark: txt2img_hr 1x768.43s/it]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.25s/it]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.55s/it]
run benchmark: txt2img_hr 1x832.24s/it]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.43s/it]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:04<00:00,  4.64s/it]
run benchmark: txt2img_hr 1x896.55s/it]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.52s/it]
benchmark error: CUDA error: the launch timed out and was terminated
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception in thread MemMon:
Traceback (most recent call last):
  File "C:\Program Files\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "D:\Stable Diffusion\stable-diffusion-webui\modules\memmon.py", line 53, in run
    free, total = self.cuda_mem_get_info()
Traceback (most recent call last):
  File "D:\Stable Diffusion\stable-diffusion-webui\modules\memmon.py", line 34, in cuda_mem_get_info
  File "D:\Stable Diffusion\stable-diffusion-webui\extensions\a1111-stable-diffusion-webui-vram-estimator\scripts\vram_estimator.py", line 170, in run_benchmark
    process_images(p)
    return torch.cuda.mem_get_info(index)
  File "D:\Stable Diffusion\stable-diffusion-webui\modules\processing.py", line 515, in process_images
    res = process_images_inner(p)
  File "D:\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\cuda\memory.py", line 618, in mem_get_info
  File "D:\Stable Diffusion\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 42, in processing_process_images_hijack
    return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
    return torch.cuda.cudart().cudaMemGetInfo(device)
  File "D:\Stable Diffusion\stable-diffusion-webui\modules\processing.py", line 669, in process_images_inner
    samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
RuntimeError: CUDA error: the launch timed out and was terminated
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

  File "D:\Stable Diffusion\stable-diffusion-webui\modules\processing.py", line 942, in sample
    samples = self.sd_model.get_first_stage_encoding(self.sd_model.encode_first_stage(decoded_samples))
  File "D:\Stable Diffusion\stable-diffusion-webui\modules\sd_hijack_utils.py", line 17, in <lambda>
    setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
  File "D:\Stable Diffusion\stable-diffusion-webui\modules\sd_hijack_utils.py", line 28, in __call__
    return self.__orig_func(*args, **kwargs)
  File "D:\Stable Diffusion\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 655, in get_first_stage_encoding
    z = encoder_posterior.sample()
  File "D:\Stable Diffusion\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\distributions\distributions.py", line 36, in sample
    x = self.mean + self.std * torch.randn(self.mean.shape).to(device=self.parameters.device)
RuntimeError: CUDA error: the launch timed out and was terminated
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 399, in run_predict
    output = await app.get_blocks().process_api(
  File "D:\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1299, in process_api
    result = await self.call_function(
  File "D:\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1022, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "D:\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "D:\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "D:\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "D:\Stable Diffusion\stable-diffusion-webui\extensions\a1111-stable-diffusion-webui-vram-estimator\scripts\vram_estimator.py", line 173, in run_benchmark
    shared.state.end()
  File "D:\Stable Diffusion\stable-diffusion-webui\modules\shared.py", line 167, in end
    devices.torch_gc()
  File "D:\Stable Diffusion\stable-diffusion-webui\modules\devices.py", line 59, in torch_gc
    torch.cuda.empty_cache()
  File "D:\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\cuda\memory.py", line 133, in empty_cache
    torch._C._cuda_emptyCache()
RuntimeError: CUDA error: the launch timed out and was terminated
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

No results were written

[VRAMEstimator] No stats available, run benchmark first
Traceback (most recent call last):
  File "D:\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 399, in run_predict
    output = await app.get_blocks().process_api(
  File "D:\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1302, in process_api
    data = self.postprocess_data(fn_index, result["prediction"], state)
  File "D:\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1206, in postprocess_data
    self.validate_outputs(fn_index, predictions)  # type: ignore
  File "D:\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1181, in validate_outputs
    raise ValueError(
ValueError: An event handler (load_curve) didn't receive enough output values (needed: 4, received: 2).
Wanted outputs:
    [plot, plot, plot, plot]
Received outputs:
    [None, None]

space-nuko / a1111-stable-diffusion-webui-vram-estimator

CUDA error: the launch timed out and was terminated #6