Open e6b5f146-fc58-407c-9b22-efe2601bb44a opened 7 years ago
Hi!
I think the ThreadPoolExecutor should allow to set the maximum size of the underlying queue.
The situation I ran into recently was that I used ThreadPoolExecutor to parallelize AWS API calls; I had to move data from one S3 bucket to another (~150M objects). Contrary to what I expected the maximum size of the underlying queue doesn't have a non-zero value by default. Thus my process ended up consuming gigabytes of memory, because it put more items into the queue than the threads were able to work off: The queue just kept growing. (It ran on K8s and the pod was rightfully killed eventually.)
Of course there ways to work around this. One could use more threads, to some extent. Or you could use your own queue with a defined maximum size. But I think it's more work for users of Python than necessary.
Hello again, there's a reviewed PR open for this issue and it hasn't even received authoritative feedback yet (ie whether or not you intend to support this feature at all). I would be very happy if a core dev could look over this change before everyone forgets about it :)
Ping. That's really a two-line change, can be easily reviewed in 15 minutes :)
Prayslayer, please don't shove. Your PR request was responded to by Mariatta so it wasn't ignored.
Making decisions about API expansions takes a while (making sure it fits the intended use, that it isn't a bug factory itself, that it is broadly useful, that it is the best solution to the problem, that is doesn't complicate the implementation or limit future opportunities, that there are unforeseen problems). Among the core developers, there are only a couple part-time contributors who are qualified to make these assessments for the multi-processing module (those devs don't include me).
My project we're going into the underlying _work_queue and blocking adding more elements based on unfinished_tasks to accomplish this, bubbling this up to the API would be a welcome addition.
Please note the PR here has some review comments that need addressing. Also, it needs its conflicts with git master resolved.
I'm cc'ing Thomas Moreau, who has done a lot of work recently on the concurrent.futures internals.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['3.8', 'type-feature', 'library']
title = 'Expose max_queue_size in ThreadPoolExecutor'
updated_at =
user = 'https://github.com/prayerslayer'
```
bugs.python.org fields:
```python
activity =
actor = 'iforapsy'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation =
creator = 'prayerslayer'
dependencies = []
files = []
hgrepos = []
issue_num = 29595
keywords = ['patch']
message_count = 6.0
messages = ['288043', '289082', '290253', '290949', '314699', '314743']
nosy_count = 11.0
nosy_names = ['rhettinger', 'pitrou', 'python-dev', 'davin', 'xiang.zhang', 'tomMoral', 'Jim Fasarakis-Hilliard', 'prayerslayer', 'stephen.oneal.04', 'iforapsy', 'tianc777']
pr_nums = ['143', '23864', '23865']
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue29595'
versions = ['Python 3.8']
```