hkchengrex / Cutie

[CVPR 2024 Highlight] Putting the Object Back Into Video Object Segmentation
https://hkchengrex.com/Cutie/
MIT License
691 stars 69 forks source link

VRAM usage with amp #47

Closed Zarxrax closed 7 months ago

Zarxrax commented 7 months ago

I have been testing the amp setting, and I am a little confused by the result I am seeing. With cutie's default settings, I see less vram usage when amp: is disabled. Only when increasing the max_internal_size do I get any vram benefit from enabling it. Each test run was conducted following a fresh restart of the application.

For a short clip with only 79 frames, it used 2.5gb with amp: True, and only 1.8gb with amp: False.

For a clip that is 1888 frames in length, I left all memory settings at defaults except I increased the long term memory size, so that I can measure the memory usage without it getting purged. With amp: True, the entire clip completed, but it ended up right at 12gb, which is the limit for my gpu. With amp: False, the entire clip processed and ended up using 11gb.

With the longer clip again, I increase the max_internal_size to 720. This time I did see a huge benefit for amp: True. With amp: True it was able to process 160 frames before coming to a stop due to being out of vram. With amp: False, it was only able to process about 65 frames.

Taking the max_internal_size back down a bit, to 540 With amp: True I was able to process 1055 frames With amp: False I was able to process 755

So basically what I am seeing, is at max_internal_size of 480 or lower, amp is harmful to vram usage. Then the more you increase max_internal_size, the more benefit that is gained from it. Can you confirm if this result makes sense? I am not sure if it could just be something peculiar to my own system, or if this result is expected.

hkchengrex commented 7 months ago

Where are you viewing the VRAM usage from?

Zarxrax commented 7 months ago

The gauge on the right panel, gpu mem, all proc, w/caching.

hkchengrex commented 7 months ago

Yeah that's not an accurate measure of how much memory the program "needs". PyTorch aggressively caches, or takes more memory than it needs. The "torch, w/o caching" one is more important.

Zarxrax commented 7 months ago

Alright thanks, I will review it some more.

Zarxrax commented 7 months ago

I guess I am having trouble understanding how the one w/o caching matters? image When the one w/caching fills up, the processing slows to a crawl. I believe cpu mode runs faster at that point. The one w/o caching displays such a ridiculously small number, I thought Cutie must really be using more vram than what it displays. Can this cache be cleared using something like torch.cuda.empty_cache(), or will that also clear useful data out of the memory?

hkchengrex commented 7 months ago

Hmm I don't think I have seen that happen before. The program only has access to and is only using the GPU memory portion w/o caching. I cannot think of any reason for there to be a significant slowdown... In any case, it should either crash or continue running at the normal speed (swapping shouldn't be possible).

You can try torch.cuda.empty_cache() -- it is not going to purge any useful data. However I don't think it would help unless there is a PyTorch's bug.

Zarxrax commented 7 months ago

The only thing I can think of is that I am on Windows, so maybe it handles the cache differently than on Linux. I am using Pytorch 2.2.

I tried adding the torch.cuda.empty_cache() when the vram got full, and it seemed to work well. There was a short pause while it cleared the cache, then it continued processing the next frames. With max_internal_size of 720, it initially had to do this every couple hundred frames, but after clearing the cache a few times, then the vram usage suddenly stopped increasing and it continued to process the remainder of the video without stopping again. The gauge displaying gpu mem w/o caching never went above 2gb.

Back to my original question about AMP. I was able to have someone else who is also using Windows test it as well. They did not have the same findings that I did. They found that AMP consistently filled less of their total vram. So I guess I will just leave AMP always turned on. With emptying the cache, I no longer having any concern with the vram usage.

hkchengrex commented 7 months ago

Glad to see that you have a working solution. Unfortunately, I am still not sure what is causing this problem. Thank you for the detailed report and description -- future users with the same problem should find this issue of great help :smile: