hustvl / Matte-Anything

[Image and Vision Computing (Vol.147 Jul. '24)] Interactive Natural Image Matting with Segment Anything Models
MIT License
465 stars 33 forks source link

CUDA out of memory. 12G VRAM is not enough? #21

Closed wangjia184 closed 2 months ago

wangjia184 commented 3 months ago

CUDA out of memory. Tried to allocate 6.51 GiB. GPU 0 has a total capacity of 11.73 GiB of which 911.38 MiB is free. Process 69810 has 10.82 GiB memory in use. Of the allocated memory 10.39 GiB is allocated by PyTorch, and 194.87 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

It cannot be run in my RTX 4070 Ti with 12G VRAM.
How much VRAM is needed to run?

I see it loads three models, SAM / VitMatte / GroundingDino, can they be loaded and unloaded in order to reduce memory occupy?

YeL6 commented 3 months ago

CUDA out of memory. Tried to allocate 6.51 GiB. GPU 0 has a total capacity of 11.73 GiB of which 911.38 MiB is free. Process 69810 has 10.82 GiB memory in use. Of the allocated memory 10.39 GiB is allocated by PyTorch, and 194.87 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

It cannot be run in my RTX 4070 Ti with 12G VRAM. How much VRAM is needed to run?

I see it loads three models, SAM / VitMatte / GroundingDino, can they be loaded and unloaded in order to reduce memory occupy?

Thank you for your interest in our work! We appreciate your question!

Matte_Anything is an interactive matting tool that follows the image you provide. Because of this, the required VRAM size is significantly related to the resolution of the image you supply. You can try reducing the image resolution to ensure Matte-Anything runs successfully on your device. I recently tried using an image with a resolution of 1280x1280, which required about 15GB of VRAM. When I reduced the resolution to 720x1280, it required about 11GB of VRAM. You can experiment with different resolutions!

Regarding your second question, I think directly uninstalling may not be feasible, but you can try using some lightweight SAM models instead.

wangjia184 commented 3 months ago

@YeL6 Thanks for bringing this amazing project

As I understand, SAM first segments the picture and generate trimap basing on user-input, then trimap is used to guide VitMatte to perform alpha matting. I am not quite sure why DINO is involved here.

but it seems to me, these models are not used at the same time. Is it possible to load each model in child process which terminates itself after prediction? so the VRAM will be freed

Here is an example:

import os
import tempfile
import asyncio
import sys

#current python script
current_script = os.path.abspath(__file__)

async def process(image_filename):
    #start child process by executing current script and append a parameter
    proc = await asyncio.create_subprocess_exec(
        sys.executable,
        current_script,
        image_filename,
        stdout=asyncio.subprocess.PIPE,
        stdin=asyncio.subprocess.PIPE,
        stderr=asyncio.subprocess.PIPE
    )

    stdout = await proc.stdout.read()
    stderr = await proc.stderr.read()

    returncode = await proc.wait()
    # determine success or failure and return result
    print(returncode)
    print(stdout)
    print(stderr)
    return stdout

# when this file executes in child process
if __name__ == "__main__":
    if len(sys.argv) != 2:
        print(f"Usage: python {os.path.basename(current_script)} image_filename")
        sys.exit(1)

    # Retrieve the parameters
    image_filename = sys.argv[1]

    # Use the parameters in your script
    print("Parameter 1:", image_filename)
wangjia184 commented 3 months ago

I tried the biggest model vit_h and it works well. So the OOM issue can be avoided if only loading a model at a time.

imperator-maximus commented 2 months ago

I would suggest another approach here using Guided Filter. https://github.com/perrying/guided-filter-pytorch So use lower res like 1024x1024 only and use Guided Filter for bring that up to any high res (e.g. 4000x4000).

wangjia184 commented 2 months ago

I have implemented my own alpha matting editor.

First running SAM-HQ(https://github.com/SysCV/SAM-HQ) model, get the segmentation matrix and allow user to select. User also has the ability to edit the selected mask if they want.

image

Then user adjusts parameters to generate trimap.
1719120706(1)

After the trimap is ready. click a button to run AEMatte(https://github.com/aipixel/AEMatter) model to extrac alpha.

All the graphics operations are done in web browser, in background web workers with OpenCV.js. so user can always preview in the flow. Front-end UI is done via Bootstrap 5 and Svelte The models are wrapped into docker image, export Swagger API for UI to interact with. Each model is started in a new subprocess, and it terminates immediately after work is done.

I cannot open-source this work because it is part of my company software. I may be going to work out an open-source edition with support of different models (VitMatte / AEMatte / DiffMatte / etc)