[bug]: VRAM not being released

invoke-ai / InvokeAI

InvokeAI is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, supports terminal use through a CLI, and serves as the foundation for multiple commercial products.

https://invoke-ai.github.io/InvokeAI/

Apache License 2.0

22.87k stars 2.37k forks source link

[bug]: VRAM not being released #6613

Closed MylesCroft closed 2 months ago

MylesCroft commented 2 months ago

Is there an existing issue for this problem?

[X] I have searched the existing issues

Operating system

Windows

GPU vendor

Nvidia (CUDA)

GPU model

P4000

GPU VRAM

8Gb

Version number

v4.2.6a1

Browser

Python dependencies

Local System accelerate

0.30.1

compel

2.0.2

cuda

12.1

diffusers

0.27.2

numpy

1.26.4

opencv

4.9.0.80

onnx

1.15.0

pillow

10.4.0

python

3.11.6

torch

2.2.2+cu121

torchvision

0.17.2+cu121

transformers

4.41.1

xformers

0.0.25.post1

What happened

Had left a batch of around 40 images to process in queue, when I returned the whole queue had crashed and required redoing all settings and prompts. attached the crash log.

[crash_log.txt](https://github.com/user-attachments/files/16222044/crash_log.txt)

What you expected to happen

queue to finish, or if failure to continue to next image

How to reproduce the problem

No response

Additional context

No response

Discord username

No response

psychedelicious commented 2 months ago

Thanks for reporting. Was the queue a single batch of generations with the same settings? If so, I this behaviour is expected, because each queue item will require similar amounts of VRAM. Either all or none should OOM.

The problem to investigate is if a single queue item OOM-ing breaks other queue items that would not, on their own, OOM.

Here's how I'm attempting to reproduce the problem:

Run a helper script that allocates a VRAM artificially, leaving me with ~6GB. Here's the script.

import sys

import torch

def allocate_vram(gb_to_allocate: float):
    bytes_to_allocate = gb_to_allocate * 1024 * 1024 * 1024

    assert torch.cuda.is_available(), "CUDA is not available"

    # FloatTensor (4 bytes per element)
    _tensor = torch.empty(int(bytes_to_allocate / 4), dtype=torch.float, device="cuda")

    print(f"Allocated {gb_to_allocate} GB of VRAM")

    input("Press Enter to release VRAM allocation and exit...")

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python allocate_vram.py <GB_to_allocate>")
        sys.exit(1)

    try:
        gb_to_allocate = float(sys.argv[1])
    except ValueError:
        print("The argument must be a number representing the amount of VRAM in GB")
        sys.exit(1)

    allocate_vram(gb_to_allocate)

Queue up a single item that should OOM, followed by a few items that should not OOM. I had to get creative to do this because I wasn't fast enough to go do it all in the same tab:
- Open Invoke in 2 tabs
- In the first tab, select a SDXL model with a single iteration. This OOMs with 6GB for me.
- In the second tab, select a SD1.5 model with 3 iterations. This does not OOM.
- Click invoke in the first tab, switch to the second tab and click invoke there.

The app works as expected for me. The first queue item OOMs and the others generate without issue. The OOM is cleared successfully.

MylesCroft commented 2 months ago

SDXL workflow - with a standard way of working:

Choose model - Starlight XL Animated, Zavy Chromax or Copax Timeless, Wildcard OG XL for example Currently four lora but I have batched up to six lora in the past with little issue. The lora I am currently is Midjourney 5.2, add-detail-xl, eldritch candids and vanta black contrast.

My usual workflow is to test batch a queue of five renders per model with a starting prompt, and then tweak from there to refine the image. I have attached the json of one of the renders that I was attempting.

mabs.json

I have tested this against previous successfully rendered images. They can render ok, but there is a higher chance of failure which has never happened previously, nor seeing the full queue ever imploding.

These are lora i have been using for several months now with little issues. I have kicked off another queue to see if I can induce a fault to get the json file from a failed render but will take a while, as the card is not the fastest on the planet.

The nVidia drivers used are: R550 U7 (552.74) on Win 11 with all patches.

psychedelicious commented 2 months ago

Thanks. I have a suspicion of the cause. I've made a dev build of the app that implements a fix. The python wheel distribution is attached here.

InvokeAI-4.2.7.dev1-py3-none-any.whl.zip

Can you please test this out and see if it fixes the issue? Here's how to install the dev build:

Download the file and unzip it. You should have a wheel file (ending in .whl).
Run the Invoke launcher.
Select the developer console.
Install the dev build from the wheel using pip install path/to/wheel/InvokeAI-4.2.7.dev1-py3-none-any.whl.
It should say that it has installed version 4.2.7dev1.

Then start up Invoke and see if the problem persists.

To revert to the stable version:

Run the Invoke Launcher.
Select the developer console.
Run pip uninstall invokeai.
Run pip install invokeai.
It should say that it has installed version 4.2.6.

MylesCroft commented 2 months ago

OK, from the testing I have done whatever trick you did seems to have fixed it. It is a lot more stable than before, with no crashes with nearly a hundred images pushed through the pipeline with up to six lora.

Thanks for your assistance.

psychedelicious commented 2 months ago

Thanks for testing, I'm reopening this until the fix is released.