light-and-ray / sd-webui-replacer

A tab for sd-webui for replacing objects in pictures or videos using detection prompt
193 stars 11 forks source link

RuntimeError: CUDA error: an illegal memory access was encountered #52

Closed mike2505 closed 5 months ago

mike2505 commented 5 months ago

I am trying to launch several webui instances with replacer in it to somehow bypass issues with multiple GPU support. I am planning to create reverse proxy that will automatically forward request to free instance. I have 8 GPUs - RTX4090, I am renting them from vast.ai.

Everything works fine on one instance, but when I try to run several instance, on every instance except first one, I have this issue:

torch._C._cuda_emptyCache() RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

I have 24GB of VRAM for each GPU and it can't even pass 10GB mark, so how it's possible to be OOM?..

Uploading nvidia-smi output and log image out.log

mike2505 commented 5 months ago

Just tested with only one instance running with device-id=1. I still have the same issue, same goes with any device id except 0...

light-and-ray commented 5 months ago

I think it's connected with segment anything extension. It uses 3 different models which are not in sd-webui. Maybe they're moved incorrectly for multy GPU systems. Ask about it there, but I think in your case you need to explore the code by yourself

Also try different Sam models, they have different code. Maybe one of them will work

light-and-ray commented 5 months ago

If someone has the same problem, there's the answer: https://github.com/continue-revolution/sd-webui-segment-anything/issues/201#issuecomment-2025343947