Fix/memory leak - Githubissues

This PR is to reduce the memory leak problem in the dataloader. When tested on 8 GPUs with 8 dataloaders (360p, local bs=5), the memory usage drops from ~450G to <300G.

Memory usage calculation

The size for a batch 51 x 360p x 5 is 1.3G. With the default prefetch_factor 2, the preloaded batch consumes 16 1.3 8 = 166G. Loading a 1080p video tends to use more than 10G. With 8 dataloaders, loading will consumes ~100G. Plus other memory usage, 300G is acceptable.

Memory leak reason

torchvision.io.read calls pyav, which causes the memory leak. The problem is that when you allocate memory after iterating pyav container (iter(container.decode({"video": 0"})), the memory leaks. The root cause is not clearly identified, but it is likely that when iterating, multiple threads are created, and allocating memory that cannot be de-allocated immediately will lead to a memory leak (e.g., stored in a python list)
torchvision.io.read did not do a container.close() and gc.collect() frequently enought.
Some objects need to be explicitly deleted in the dataloader to prevent memory leak.

Memory leak solution

For the latter two, the solution is straightforward. For the first one, we rewrite the torchvision.io.read by creating a numpy buffer in advance to avoid the memory leak.

Other knowing memory leak (won't be fixed recently)

Creating models lead to memory leak (~4G)
pyav still leaks some memory

hpcaitech / Open-Sora

Fix/memory leak #526

Memory usage calculation

Memory leak reason

Memory leak solution

Other knowing memory leak (won't be fixed recently)