Open ed1d1a8d opened 1 year ago
The only explanation I could find here is that tqdm somehow keeps a reference on the iterator which also extends Thread
in FFCV. by using manual tqdm you are not giving a reference on the iterator to tqdm so it can't keep it not exhibiting the problem. It must therefore be a problem with tqdm. What happens if you .join()
on the iterator after the iteration does it block forever ? If it doesn't block then it means the thread is completed and tqdm is just keeping a reference there for some sort of weird reason. You can also inspect the garbage collector to see who is actually holding reference to the object blocking its collection
I used FFCV with tqdm for long running job, and noticed it crashed due to using too many threads (~4000 threads had built up over 30 hours in a linear fashion, and the OS eventually stepped and forbid my process from creating any new threads).
Upon further investigation, I found that when FFCV is used with tqdm, there seems to be a thread-leak (i.e. new threads are created that never get deleted).
Here's a reproduction of the issue: https://gist.github.com/ed1d1a8d/424e5bc83325c93037cfe2de9e457a68
I'm curious if this is an issue with FFCV or an issue with tqdm, and if it is a known problem.
TL;DR Is seems like the following ways of using ffcv with tqdm are broken:
but the following methods are OK: