python / cpython

The Python programming language
https://www.python.org
Other
62.35k stars 29.94k forks source link

Expose max_queue_size in ThreadPoolExecutor #73781

Open e6b5f146-fc58-407c-9b22-efe2601bb44a opened 7 years ago

e6b5f146-fc58-407c-9b22-efe2601bb44a commented 7 years ago
BPO 29595
Nosy @rhettinger, @pitrou, @applio, @zhangyangyu, @tomMoral, @DimitrisJim, @prayerslayer, @stephenoneal, @iforapsy, @tianc777
PRs
  • python/cpython#143
  • python/cpython#23864
  • python/cpython#23865
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['3.8', 'type-feature', 'library'] title = 'Expose max_queue_size in ThreadPoolExecutor' updated_at = user = 'https://github.com/prayerslayer' ``` bugs.python.org fields: ```python activity = actor = 'iforapsy' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'prayerslayer' dependencies = [] files = [] hgrepos = [] issue_num = 29595 keywords = ['patch'] message_count = 6.0 messages = ['288043', '289082', '290253', '290949', '314699', '314743'] nosy_count = 11.0 nosy_names = ['rhettinger', 'pitrou', 'python-dev', 'davin', 'xiang.zhang', 'tomMoral', 'Jim Fasarakis-Hilliard', 'prayerslayer', 'stephen.oneal.04', 'iforapsy', 'tianc777'] pr_nums = ['143', '23864', '23865'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue29595' versions = ['Python 3.8'] ```

    e6b5f146-fc58-407c-9b22-efe2601bb44a commented 7 years ago

    Hi!

    I think the ThreadPoolExecutor should allow to set the maximum size of the underlying queue.

    The situation I ran into recently was that I used ThreadPoolExecutor to parallelize AWS API calls; I had to move data from one S3 bucket to another (~150M objects). Contrary to what I expected the maximum size of the underlying queue doesn't have a non-zero value by default. Thus my process ended up consuming gigabytes of memory, because it put more items into the queue than the threads were able to work off: The queue just kept growing. (It ran on K8s and the pod was rightfully killed eventually.)

    Of course there ways to work around this. One could use more threads, to some extent. Or you could use your own queue with a defined maximum size. But I think it's more work for users of Python than necessary.

    e6b5f146-fc58-407c-9b22-efe2601bb44a commented 7 years ago

    Hello again, there's a reviewed PR open for this issue and it hasn't even received authoritative feedback yet (ie whether or not you intend to support this feature at all). I would be very happy if a core dev could look over this change before everyone forgets about it :)

    e6b5f146-fc58-407c-9b22-efe2601bb44a commented 7 years ago

    Ping. That's really a two-line change, can be easily reviewed in 15 minutes :)

    rhettinger commented 7 years ago

    Prayslayer, please don't shove. Your PR request was responded to by Mariatta so it wasn't ignored.

    Making decisions about API expansions takes a while (making sure it fits the intended use, that it isn't a bug factory itself, that it is broadly useful, that it is the best solution to the problem, that is doesn't complicate the implementation or limit future opportunities, that there are unforeseen problems). Among the core developers, there are only a couple part-time contributors who are qualified to make these assessments for the multi-processing module (those devs don't include me).

    2029f036-3446-4b82-a64d-d1f5ebbb8cb5 commented 6 years ago

    My project we're going into the underlying _work_queue and blocking adding more elements based on unfinished_tasks to accomplish this, bubbling this up to the API would be a welcome addition.

    pitrou commented 6 years ago

    Please note the PR here has some review comments that need addressing. Also, it needs its conflicts with git master resolved.

    I'm cc'ing Thomas Moreau, who has done a lot of work recently on the concurrent.futures internals.