[Issue]: Dev branch CUDA error: device-side assert triggered

zaxwashere commented 1 month ago

Issue Description

On current dev branch for the last few days I've been unable to generate consistently if I change models.

The flow is basically:

load sdnext select model generate image change model generate 1 image try to generate 2nd image crash

It's a consistent crash, where I end up with a black screen It happens regardless of changing between model types as well. sdxl/pony to another sdxl/pony model or sd1.5 to sdxl/pony, both result in the same issue.

This is a fresh pull of the dev branch with all default settings and no added extensions . Issue does not seem to appear on master branch.

sdnext.log

Version Platform Description

2024-10-01 02:34:09,382 | sd | INFO | loader | Load packages: {'torch': '2.4.1+cu124', 'diffusers': '0.31.0.dev0', 'gradio': '3.43.2', 'transformers': '4.44.2', 'accelerate': '0.34.2'} 2024-10-01 02:34:09,836 | sd | DEBUG | shared | Huggingface cache: folder="C:\Users\zaxof.cache\huggingface\hub" 2024-10-01 02:34:09,925 | sd | INFO | shared | Device detect: memory=12.0 optimization=none 2024-10-01 02:34:09,927 | sd | DEBUG | shared | Read: file="config.json" json=31 bytes=1315 time=0.000 2024-10-01 02:34:09,929 | sd | INFO | shared | Engine: backend=Backend.DIFFUSERS compute=None device=cuda attention="Scaled-Dot-Product" mode=no_grad 2024-10-01 02:34:09,930 | sd | DEBUG | shared | Read: file="html\reference.json" json=52 bytes=29118 time=0.001 2024-10-01 02:34:10,233 | sd | DEBUG | init | ONNX: version=1.19.2 provider=CPUExecutionProvider, available=['AzureExecutionProvider', 'CPUExecutionProvider'] 2024-10-01 02:34:10,401 | sd | INFO | shared | Device: device=NVIDIA GeForce RTX 3060 n=1 arch=sm_90 capability=(8, 6) cuda=12.4 cudnn=90100 driver=561.09

Relevant log output

2024-10-01 02:38:16,995 | sd | ERROR | processing_args | Prompt parser encode: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

2024-10-01 02:38:16,997 | sd | ERROR | processing_helpers | Torch generator: seeds=[427954724] device=cuda CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

2024-10-01 02:38:16,999 | sd | DEBUG | processing_args | Diffuser pipeline: StableDiffusionPipeline task=DiffusersTaskType.TEXT_2_IMAGE batch=1/1x1 set={'prompt': 1, 'negative_prompt': 1, 'guidance_scale': 3, 'num_inference_steps': 10, 'eta': 1.0, 'guidance_rescale': 0.7, 'output_type': 'latent', 'width': 1024, 'height': 1024, 'parser': 'Fixed attention'}
2024-10-01 02:38:17,005 | sd | ERROR | processing_diffusers | Processing: args={'prompt': ['score_9,1girl,'], 'negative_prompt': [''], 'guidance_scale': 3, 'generator': None, 'callback_on_step_end': <function diffusers_callback at 0x00000261F86A6680>, 'callback_on_step_end_tensor_inputs': ['latents', 'prompt_embeds', 'negative_prompt_embeds'], 'num_inference_steps': 10, 'eta': 1.0, 'guidance_rescale': 0.7, 'output_type': 'latent', 'width': 1024, 'height': 1024} CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Backend

Diffusers

UI

Standard

Branch

Dev

Model

StableDiffusion XL

Acknowledgements

[X] I have read the above and searched for existing issues
[X] I confirm that this is classified correctly and its not an extension issue

vladmandic commented 1 month ago

should be fixed in latest dev.

zaxwashere commented 1 month ago

the issue persists on my end. I realized the log didn't capture everything that I see in the console. I've attached it separately as well as the full log. device-side assert CONSOLE.txt device-side assert.log

mart-hill commented 1 month ago

On my end it's even worse now, than 'two commits ago' - I'm getting similar errors, and can't generate anything on diffusers, just after fresh SDNext.UI boot and dropping an image from Forge UI (and fixing samplers and scaler for both passes) to try generating it here (with different result, of course) I've got these errors:

device-side.assertion-CONSOLE.txt

vladmandic commented 1 month ago

just pushed an update, should resolve this.

vladmandic / automatic