hi,how to release the gpu memory after call the function 'NVVLVideoLoader'?

mitmul / pynvvl

A Python wrapper of NVIDIA Video Loader (NVVL) with CuPy for fast video loading with Python

MIT License

102 stars 12 forks source link

hi,how to release the gpu memory after call the function 'NVVLVideoLoader'? #9

Open blankWorld opened 6 years ago

blankWorld commented 6 years ago

loader = pynvvl.NVVLVideoLoader(device_id=0, log_level='error') video = loader.read_sequence(video_root).get()

With my dataloader code, the gpu memory is increasing progressively. what should i do to release gpu memory?

HanaanY commented 6 years ago

@blankWorld I think this is supposed to get cleaned up automatically with the __dealloc__ method but I'm also experiencing the same thing with memory increasing progressively. It might be that I'm calling new VideoLoaders faster than the memory is released?

yuyay commented 6 years ago

I'm facing the same problem. I temporally avoid the problem by adding the following lines after loading a video.

mempool = cupy.get_default_memory_pool()
mempool.free_all_blocks()

However, I don't think that this is an appropriate way because the overhead of freeing GPU memory is so large.
Do you @mitmul have more appropriate ideas to solve this problem?

aBlueDragon commented 5 years ago

@yuyay Your solution works just fine. However, I still hope that more efficient ways can be found, probably making the DataLoader be aware of the memory allocated by pynvvl in the GPU. This will help a lot when using the library in a dataloader for training deep networks.

aBlueDragon commented 5 years ago

Right now I find it hard to manage the memory when using pynvvl in PyTorch dataloader. The problem is when you dlpack a cupy array and use from_dlpack to convert it into a tensor, the array is not recognized as a allocated memory by PyTorch. Meanwhile, as you can still access the array (from_dlpack does not copy the array, the array stays in the original place), the memory cannot be freed by calling cupy.get_default_memory_pool().free_all_blocks().

As a result, there happens a memory leak, where the dlpacked cupy array can neither be freed with cupy function, nor can it be automatically handled by PyTorch after its training iteration ends. The memory consumption will progressively increase until it hits the maximum size available and an out-of-memory error will raise.

Has anyone come up with a solution for this yet?