invoke-ai / InvokeAI

InvokeAI is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, supports terminal use through a CLI, and serves as the foundation for multiple commercial products.
https://invoke-ai.github.io/InvokeAI/
Apache License 2.0
22.87k stars 2.37k forks source link

[bug]: VRAM not being released #6613

Closed MylesCroft closed 2 months ago

MylesCroft commented 2 months ago

Is there an existing issue for this problem?

Operating system

Windows

GPU vendor

Nvidia (CUDA)

GPU model

P4000

GPU VRAM

8Gb

Version number

v4.2.6a1

Browser

na

Python dependencies

Local System accelerate

0.30.1

compel

2.0.2

cuda

12.1

diffusers

0.27.2

numpy

1.26.4

opencv

4.9.0.80

onnx

1.15.0

pillow

10.4.0

python

3.11.6

torch

2.2.2+cu121

torchvision

0.17.2+cu121

transformers

4.41.1

xformers

0.0.25.post1

What happened

Had left a batch of around 40 images to process in queue, when I returned the whole queue had crashed and required redoing all settings and prompts. attached the crash log.

[crash_log.txt](https://github.com/user-attachments/files/16222044/crash_log.txt)

What you expected to happen

queue to finish, or if failure to continue to next image

How to reproduce the problem

No response

Additional context

No response

Discord username

No response

psychedelicious commented 2 months ago

Thanks for reporting. Was the queue a single batch of generations with the same settings? If so, I this behaviour is expected, because each queue item will require similar amounts of VRAM. Either all or none should OOM.

The problem to investigate is if a single queue item OOM-ing breaks other queue items that would not, on their own, OOM.

Here's how I'm attempting to reproduce the problem:

The app works as expected for me. The first queue item OOMs and the others generate without issue. The OOM is cleared successfully.

MylesCroft commented 2 months ago

SDXL workflow - with a standard way of working:

Choose model - Starlight XL Animated, Zavy Chromax or Copax Timeless, Wildcard OG XL for example Currently four lora but I have batched up to six lora in the past with little issue. The lora I am currently is Midjourney 5.2, add-detail-xl, eldritch candids and vanta black contrast.

My usual workflow is to test batch a queue of five renders per model with a starting prompt, and then tweak from there to refine the image. I have attached the json of one of the renders that I was attempting.

mabs.json

I have tested this against previous successfully rendered images. They can render ok, but there is a higher chance of failure which has never happened previously, nor seeing the full queue ever imploding.

These are lora i have been using for several months now with little issues. I have kicked off another queue to see if I can induce a fault to get the json file from a failed render but will take a while, as the card is not the fastest on the planet.

The nVidia drivers used are: R550 U7 (552.74) on Win 11 with all patches.

psychedelicious commented 2 months ago

Thanks. I have a suspicion of the cause. I've made a dev build of the app that implements a fix. The python wheel distribution is attached here.

InvokeAI-4.2.7.dev1-py3-none-any.whl.zip

Can you please test this out and see if it fixes the issue? Here's how to install the dev build:

Then start up Invoke and see if the problem persists.

To revert to the stable version:

MylesCroft commented 2 months ago

OK, from the testing I have done whatever trick you did seems to have fixed it. It is a lot more stable than before, with no crashes with nearly a hundred images pushed through the pipeline with up to six lora.

Thanks for your assistance.

psychedelicious commented 2 months ago

Thanks for testing, I'm reopening this until the fix is released.