python / cpython

The Python programming language
https://www.python.org
Other
62.87k stars 30.12k forks source link

Avoid releasing the GIL in nonblocking socket operations #89977

Open 2f5601e0-9cbf-47ae-a123-6b01e62b2a28 opened 2 years ago

2f5601e0-9cbf-47ae-a123-6b01e62b2a28 commented 2 years ago
BPO 45819
Nosy @rhettinger, @tiran, @asvetlov, @1st1, @jakirkham, @jcrist
PRs
  • python/cpython#29579
  • Files
  • bench.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['3.11', 'expert-asyncio', 'expert-IO', 'performance'] title = 'Avoid releasing the GIL in nonblocking socket operations' updated_at = user = 'https://github.com/jcrist' ``` bugs.python.org fields: ```python activity = actor = 'jakirkham' assignee = 'none' closed = False closed_date = None closer = None components = ['IO', 'asyncio'] creation = creator = 'jcristharif' dependencies = [] files = ['50443'] hgrepos = [] issue_num = 45819 keywords = ['patch'] message_count = 3.0 messages = ['406422', '406431', '406436'] nosy_count = 6.0 nosy_names = ['rhettinger', 'christian.heimes', 'asvetlov', 'yselivanov', 'jakirkham', 'jcristharif'] pr_nums = ['29579'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'performance' url = 'https://bugs.python.org/issue45819' versions = ['Python 3.11'] ```

    2f5601e0-9cbf-47ae-a123-6b01e62b2a28 commented 2 years ago

    In https://bugs.python.org/issue7946 an issue with how the current GIL interacts with mixing IO and CPU bound work. Quoting this issue:

    when an I/O bound thread executes an I/O call, it always releases the GIL. Since the GIL is released, a CPU bound thread is now free to acquire the GIL and run. However, if the I/O call completes immediately (which is common), the I/O bound thread immediately stalls upon return from the system call. To get the GIL back, it now has to go through the timeout process to force the CPU-bound thread to release the GIL again.

    This issue can come up in any application where IO and CPU bound work are mixed (we've found it to be a cause of performance issues in https://dask.org for example). Fixing the general problem is tricky and likely requires changes to the GIL's internals, but in the specific case of mixing asyncio running in one thread and CPU work happening in background threads, there may be a simpler fix - don't release the GIL if we don't have to.

    Asyncio relies on nonblocking socket operations, which by definition shouldn't block. As such, releasing the GIL shouldn't be needed for many operations (send, recv, ...) on socket.socket objects provided they're in nonblocking mode (as suggested in https://bugs.python.org/issue7946#msg99477). Likewise, dropping the GIL can be avoided when calling select on selectors.BaseSelector objects with a timeout of 0 (making it a non-blocking call).

    I've made a patch (https://github.com/jcrist/cpython/tree/keep-gil-for-fast-syscalls) with these two changes, and run a benchmark (attached) to evaluate the effect of background threads with/without the patch. The benchmark starts an asyncio server in one process, and a number of clients in a separate process. A number of background threads that just spin are started in the server process (configurable by the -t flag, defaults to 0), then the server is loaded to measure the RPS.

    Here are the results:

    # Main branch
    $ python bench.py -c1 -t0
    Benchmark: clients = 1, msg-size = 100, background-threads = 0
    16324.2 RPS
    $ python bench.py -c1 -t1
    Benchmark: clients = 1, msg-size = 100, background-threads = 1
    Spinner spun 1.52e+07 cycles/second
    97.6 RPS
    $ python bench.py -c2 -t0
    Benchmark: clients = 2, msg-size = 100, background-threads = 0
    31308.0 RPS
    $ python bench.py -c2 -t1
    Benchmark: clients = 2, msg-size = 100, background-threads = 1
    Spinner spun 1.52e+07 cycles/second
    96.2 RPS
    $ python bench.py -c10 -t0
    Benchmark: clients = 10, msg-size = 100, background-threads = 0
    47169.6 RPS
    $ python bench.py -c10 -t1
    Benchmark: clients = 10, msg-size = 100, background-threads = 1
    Spinner spun 1.54e+07 cycles/second
    95.4 RPS
    
    # With this patch
    $ ./python bench.py -c1 -t0
    Benchmark: clients = 1, msg-size = 100, background-threads = 0
    18201.8 RPS
    $ ./python bench.py -c1 -t1
    Benchmark: clients = 1, msg-size = 100, background-threads = 1
    Spinner spun 9.03e+06 cycles/second
    194.6 RPS
    $ ./python bench.py -c2 -t0
    Benchmark: clients = 2, msg-size = 100, background-threads = 0
    34151.8 RPS
    $ ./python bench.py -c2 -t1
    Benchmark: clients = 2, msg-size = 100, background-threads = 1
    Spinner spun 8.72e+06 cycles/second
    729.6 RPS
    $ ./python bench.py -c10 -t0
    Benchmark: clients = 10, msg-size = 100, background-threads = 0
    53666.6 RPS
    $ ./python bench.py -c10 -t1
    Benchmark: clients = 10, msg-size = 100, background-threads = 1
    Spinner spun 5e+06 cycles/second
    21838.2 RPS

    A few comments on the results:

    I have also tested this patch on a Dask cluster running some real-world problems and found that it did improve performance where IO was throttled due to GIL contention.

    rhettinger commented 2 years ago

    +1 There is almost no upside for the current behavior.

    tiran commented 2 years ago

    Do POSIX and Windows APIs guarantee that operations on a nonblocking socket can never ever block?

    We cannot safely keep the GIL unless you can turn "nonblocking socket operations by definition shouldn't block" into "standard guarantees that nonblocking socket operations never block".

    gvanrossum commented 1 year ago

    It does seem interesting. Maybe this idea can be implemented in a 3rd party library like uvloop to see if there are hidden catches? @1st1