Open 4444c7d9-b484-40b0-a5b1-36de99b7016b opened 4 years ago
I've been debugging a high memory consumption in one of my scripts and traced it back to the concurrent.futures.ThreadPoolExecutor
.
When further investigating and playing around, I found out that when using concurrent.futures.ThreadPoolExecutor
with the map function, and passing a dictionary to the map's function as an argument, the memory used by the pool won't be freed and as a result the total memory consumption will continue to rise. (Seems like it also happens when passing a list and maybe even other types).
Here is an example of a code to recreate this issue:
#!/usr/bin/env python3
import os
import time
import psutil
import random
import concurrent.futures
from memory_profiler import profile as mem_profile
p = psutil.Process(os.getpid())
def do_magic(values):
return None
@mem_profile
def foo():
a = {i: chr(i) for i in range(1024)}
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as pool:
proccessed_data = pool.map(do_magic, a)
def fooer():
while True:
foo()
time.sleep(1)
fooer()
Still no fix? :(
I had the same issue and I've been forced to patch it using submit
method and then waiting for the result of each feature.
I ran into this same issue (with a ProcessPool). And after a bit of probing it seems that it's more a generator issue. Unused generators (what .map() returns) are not being garbage collected. If I do list(executor.map(..))
which exhausts the generator I am no longer seeing unbounded memory growth. Of course I'm uselessly slowing down my code, but it's better than having to recreate the ProcessPool / use submit.
Edit:
I think my issue may not have been an actual memory leak, but a misunderstanding of the .map
function. I realized that since it's a generator it doesn't actually wait for all the tasks to finish unless it's been iterated on. Thus I was submitting more tasks before they could get completed.
I can't believe this remains unfixed. Any workarounds?
@vstinner @iritkatriel
Any ideas on how to fix this issue?
I don't have the bandwidth to dig into multiprocessing issues.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['extension-modules', '3.7', 'performance']
title = "Potential Memory leak with concurrent.futures.ThreadPoolExecutor's map"
updated_at =
user = 'https://bugs.python.org/or12'
```
bugs.python.org fields:
```python
activity =
actor = 'aeros'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Extension Modules']
creation =
creator = 'or12'
dependencies = []
files = []
hgrepos = []
issue_num = 41588
keywords = []
message_count = 1.0
messages = ['375647']
nosy_count = 4.0
nosy_names = ['bquinlan', 'pitrou', 'aeros', 'or12']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'resource usage'
url = 'https://bugs.python.org/issue41588'
versions = ['Python 3.7']
```