joblib / joblib

Computing with Python functions.
http://joblib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
3.89k stars 417 forks source link

Is it possible to remove the input files from memmapping in Parallel() when these have been used? #1298

Open jordi-torrents opened 2 years ago

jordi-torrents commented 2 years ago

I am processing videos in parallel using joblib. I have something like

small_output = Parallel(n_jobs=n_jobs)(
    delayed(process_frame) (frame) for frame in give_me_a_frame)
)

where give_me_a_frame is an object with a cv2.caption that keeps reading frames one by one and distributing them to the parallel pool.

My problem is that I work with chunks of 500 frames from a 5K video, so every frame is around 50MB. Memmapping is awesome in this case and really speedups the in/out process BUT the memory usage is too high. Memory usage keeps increasing with every processed frame until the parallel pool is closed. Every time a frame is created and sended to a worker, that frame is saved somewhere (in physical or virtual memory) and it is not removed until all 500 are processed and the pool is closed. That is, 25GB of data (500 frames) that you have to put somewhere.

Is there a better option? Is it possible to keep the memmapping clean and lightwaieght as the frames become processed (so we don't need them anymore)?

ogrisel commented 2 years ago

At the moment we do not have a solution because memmapping was primarily meant to be useful when the delayed function is called many times on the same large array (with optional side parameters).

If each frame is individually unique, then memmapping is pretty useless. Try disabling it entirely with max_nbytes=None.

jordi-torrents commented 2 years ago

Thanks for aswering so fast!

I have tried all combinations of Parallel() options and they are all slower than setting every option to default with the memmapping going freely. I can undersand why memmap is useless in my case but it turns out it speeds up my code (~x2) using 25GB of memory.

I'm working with a 36 CPU machine (may this afect?)

ogrisel commented 2 years ago

Interesting, I would not have expected memmapping to be useful in this situation. But maybe it yields faster pickling / unpickling.

However, I am not interested in investing more effort in changing how memmapping works in joblib. I think the best long term effort for single-machine parallel programming in Python is the experimental nogil variant: https://github.com/colesbury/nogil/

Once this is ready, we will no longer need to spawn worker processes and all of this will be mostly useless, see: https://twitter.com/ogrisel/status/1524789529349173249

ogrisel commented 2 years ago

In particular #1299 should help moving forward to making joblib run on the nogil variant, although this alone will not be enough.