python / cpython

The Python programming language
https://www.python.org
Other
62.29k stars 29.93k forks source link

Potential Memory leak with concurrent.futures.ThreadPoolExecutor's map #85754

Open 4444c7d9-b484-40b0-a5b1-36de99b7016b opened 4 years ago

4444c7d9-b484-40b0-a5b1-36de99b7016b commented 4 years ago
BPO 41588
Nosy @brianquinlan, @pitrou, @aeros

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['extension-modules', '3.7', 'performance'] title = "Potential Memory leak with concurrent.futures.ThreadPoolExecutor's map" updated_at = user = 'https://bugs.python.org/or12' ``` bugs.python.org fields: ```python activity = actor = 'aeros' assignee = 'none' closed = False closed_date = None closer = None components = ['Extension Modules'] creation = creator = 'or12' dependencies = [] files = [] hgrepos = [] issue_num = 41588 keywords = [] message_count = 1.0 messages = ['375647'] nosy_count = 4.0 nosy_names = ['bquinlan', 'pitrou', 'aeros', 'or12'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'resource usage' url = 'https://bugs.python.org/issue41588' versions = ['Python 3.7'] ```

4444c7d9-b484-40b0-a5b1-36de99b7016b commented 4 years ago

I've been debugging a high memory consumption in one of my scripts and traced it back to the concurrent.futures.ThreadPoolExecutor.

When further investigating and playing around, I found out that when using concurrent.futures.ThreadPoolExecutor with the map function, and passing a dictionary to the map's function as an argument, the memory used by the pool won't be freed and as a result the total memory consumption will continue to rise. (Seems like it also happens when passing a list and maybe even other types).

Here is an example of a code to recreate this issue:

#!/usr/bin/env python3

import os
import time
import psutil
import random
import concurrent.futures

from memory_profiler import profile as mem_profile

p = psutil.Process(os.getpid())

def do_magic(values):
    return None

@mem_profile
def foo():
    a = {i: chr(i) for i in range(1024)}
    with concurrent.futures.ThreadPoolExecutor(max_workers=10) as pool:
        proccessed_data = pool.map(do_magic, a)

def fooer():
    while True:
        foo()
        time.sleep(1)

fooer()
Robert-Lebedeu commented 2 years ago

Still no fix? :( I had the same issue and I've been forced to patch it using submit method and then waiting for the result of each feature.

Sxderp commented 1 year ago

I ran into this same issue (with a ProcessPool). And after a bit of probing it seems that it's more a generator issue. Unused generators (what .map() returns) are not being garbage collected. If I do list(executor.map(..)) which exhausts the generator I am no longer seeing unbounded memory growth. Of course I'm uselessly slowing down my code, but it's better than having to recreate the ProcessPool / use submit.


Edit: I think my issue may not have been an actual memory leak, but a misunderstanding of the .map function. I realized that since it's a generator it doesn't actually wait for all the tasks to finish unless it's been iterated on. Thus I was submitting more tasks before they could get completed.

CaledoniaProject commented 1 year ago

I can't believe this remains unfixed. Any workarounds?

Robert-Lebedeu commented 1 year ago

@vstinner @iritkatriel

Any ideas on how to fix this issue?

vstinner commented 1 year ago

I don't have the bandwidth to dig into multiprocessing issues.