hkchengrex / MiVOS

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion. Semi-supervised VOS as well!
https://hkchengrex.com/MiVOS/
MIT License
469 stars 64 forks source link

Process killed #22

Closed zdhernandez closed 2 years ago

zdhernandez commented 2 years ago

I tried the MIVOS + STCN on a 1.5 minute 4k video that was down sampled to 480p and the program crashed.

What are the steps to reformat/sample a 4k video to make it work for this tool?

Also can this tool run on multiple GPUs?

hkchengrex commented 2 years ago

What was the error?

zdhernandez commented 2 years ago

What was the error? @hkchengrex attached pictures Screenshot from 2021-12-02 15-24-33 Screenshot from 2021-12-02 15-26-55 Screenshot from 2021-12-02 15-27-42

hkchengrex commented 2 years ago

Can you try --mem_profile 2?

hkchengrex commented 2 years ago

There are also a lot of frames so maybe you need to set --mem_freq higher, e.g., 30.

zdhernandez commented 2 years ago

Can you try --mem_profile 2?

@hkchengrex seems to work. Do you think you have memory leaks ? Because the video I'm trying to use is only 34MB. I see what it is doing now in:

class InferenceCore:
    """
    images - leave them in original dimension (unpadded), but do normalize them. 
            Should be CPU tensors of shape B*T*3*H*W

    mem_profile - How extravagant I can use the GPU memory. 
                Usually more memory -> faster speed but I have not drawn the exact relation
                0 - Use the most memory
                1 - Intermediate, larger buffer 
                2 - Intermediate, small buffer 
                3 - Use the minimal amount of GPU memory
                Note that *none* of the above options will affect the accuracy
                This is a space-time tradeoff, not a space-performance one

    mem_freq - Period at which new memory are put in the bank
                Higher number -> less memory usage
                Unlike the last option, this *is* a space-performance tradeoff
    """

Screenshot from 2021-12-02 15-32-26

hkchengrex commented 2 years ago

34MB is the size after compression. Raw pixels occupy much more space. The model in this branch https://github.com/hkchengrex/MiVOS/tree/MiVOS-STCN will perform better/use somewhat less memory I believe.

zdhernandez commented 2 years ago

34MB is the size after compression. Raw pixels occupy much more space. The model in this branch https://github.com/hkchengrex/MiVOS/tree/MiVOS-STCN will perform better/use somewhat less memory I believe.

@hkchengrex that's the branch I'm using. So what do we need to do to control the memory. I see the YouTube videos which are long and high resolution. What's different there ? How was the input of a large resolution video adjusted to the tool without you know crashing the GPU memory or causing memory issues ? Was a much more powerful hardware used ? or multiple GPUs ? I would like to do 30 seconds of a 4k video at 30FPs. But it doesn't have to be 4k, it can be 1080. Still 30 seconds at 30FPs.

hkchengrex commented 2 years ago

None of the youtube videos that we show are that long (<500 frames). We just downsample high-res video to 480p (probably like what you did). We used a 2080Ti (11GB) and that should be sufficient for all the examples shown.

For your setting, I would suggest a higher mem_freq and use 480p videos to generate the masks (upsample afterward).

zdhernandez commented 2 years ago

None of the youtube videos that we show are that long (<500 frames). We just downsample high-res video to 480p (probably like what you did). We used a 2080Ti (11GB) and that should be sufficient for all the examples shown.

For your setting, I would suggest a higher mem_freq and use 480p videos to generate the masks (upsample afterward).

@hkchengrex I see. So a higher 'mem_freq' with video down-sampled for MIVOS + STCN processing and then once finished the masks are up-sampled as post-processing to match the dimension of the original 1080 or 4K video. Does that sound right ?

hkchengrex commented 2 years ago

Yes.

zdhernandez commented 2 years ago

Yes.

@hkchengrex got it! Thank you very much!