luispedro / jug

Parallel programming with Python
https://jug.readthedocs.io
MIT License
412 stars 62 forks source link

Memory "leak" #75

Closed justinrporter closed 5 years ago

justinrporter commented 5 years ago

In certain circumstances, when the jug executor is run for long periods of time, memory appears not to be released from previous tasks. This leads to an eventual OOM when many tasks are executed.

For example, if you have a situation like this:

import resource

import numpy as np
from jug import TaskGenerator

@TaskGenerator
def make_array(i):

    print('make_array memory footprint is %0.2f MB' %
          (resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024**2))

    a = np.random.randint(0, 512, size=(128, 128, 4))
    return a

@TaskGenerator
def process_array(a):
    print('process_array memory footprint is %0.2f MB' %
          (resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024**2))

    a = a * 10
    return a

list_of_arrays = [make_array(i) for i in range(1000)]
processed_arrays = [process_array(a) for a in list_of_arrays]

Then you get the following output:

make_array memory footprint is 28.96 MB
make_array memory footprint is 29.64 MB
make_array memory footprint is 30.14 MB
[...]
make_array memory footprint is 529.29 MB
make_array memory footprint is 529.79 MB
process_array memory footprint is 530.30 MB
process_array memory footprint is 530.81 MB
[...]
process_array memory footprint is 1029.42 MB
process_array memory footprint is 1029.93 MB
process_array memory footprint is 1030.43 MB
process_array memory footprint is 1030.93 MB
    Executed      Loaded  Task name
------------------------------------------------
        1000           0  jugfile.make_array
        1000           0  jugfile.process_array
.............................................................................
        2000           0  Total

I imagine that this has something to do with the way finished jobs are serialized, and that there is a reference to that data hanging around somewhere?

luispedro commented 5 years ago

There is an option (--aggressive-unload) which deals with this.

Currently, by default, jug will keep all the results from the tasks it ran in memory. The reasoning is that it avoids loading and re-loading, which in some instances could be terrible. With --aggressive-unload it removes everything from memory unless it is needed for the very next task. Neither is a very fancy method, though.

justinrporter commented 5 years ago

Ahhhh I think I remember seeing that somewhere and it didn't quite click. Thanks!

Can I propose that when --target is passed, --aggressive-unload should be active by default? My reasoning is that, unless the task is recursive, you are guaranteed not to need the results of the task in memory if you're only running one stage of your pipeline.

Another thought might be to automatically start unloading when your memory footprint gets to be near the system RAM?

Anyway, thanks for the awesome library.