python / cpython

The Python programming language
https://www.python.org
Other
63.2k stars 30.26k forks source link

use AI_NUMERICHOST | AI_NUMERICSERV to skip getaddrinfo thread in asyncio #90980

Open f82f2f79-bfff-44c7-86e7-c93f7a9fd1fe opened 2 years ago

f82f2f79-bfff-44c7-86e7-c93f7a9fd1fe commented 2 years ago
BPO 46824
Nosy @asvetlov, @bdarnell, @1st1, @graingert
PRs
  • python/cpython#31497
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-feature', 'expert-asyncio'] title = 'use AI_NUMERICHOST | AI_NUMERICSERV to skip getaddrinfo thread in asyncio' updated_at = user = 'https://github.com/graingert' ``` bugs.python.org fields: ```python activity = actor = 'Ben.Darnell' assignee = 'none' closed = False closed_date = None closer = None components = ['asyncio'] creation = creator = 'graingert' dependencies = [] files = [] hgrepos = [] issue_num = 46824 keywords = ['patch'] message_count = 5.0 messages = ['413699', '413701', '413702', '415510', '415511'] nosy_count = 4.0 nosy_names = ['asvetlov', 'Ben.Darnell', 'yselivanov', 'graingert'] pr_nums = ['31497'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue46824' versions = [] ```

    f82f2f79-bfff-44c7-86e7-c93f7a9fd1fe commented 2 years ago

    now that the getaddrinfo lock has been removed on all platforms the numeric only host resolve in asyncio could be moved back into BaseEventLoop.getaddrinfo

    asvetlov commented 2 years ago

    Could you provide more context for the proposed change?

    f82f2f79-bfff-44c7-86e7-c93f7a9fd1fe commented 2 years ago

    hello, it's actually a bit of a round about context, but it was brought up on a tornado issue where I was attempting to port the asyncio optimization to tornado: https://github.com/tornadoweb/tornado/issues/3113#issuecomment-1041019287

    I think it would be better to use this AI_NUMERICHOST | AI_NUMERICSERV optimization from trio everywhere instead

    5c0d0cd9-bbdc-4a00-8830-a38a1efc5f8a commented 2 years ago

    To summarize the justification, this patch does two things: it moves an optimization from create_connection to getaddrinfo, which makes it apply to more callers (including Tornado), and it makes the code simpler and less redundant (net reduction of 47 non-test lines in the patch).

    As far as we can tell, the reason it wasn't done this way in the first place is that at the time getaddrinfo held a global lock on some platforms, but this is no longer true. If there's still some locking in or around getaddrinfo on some platforms (or some libc implementations), this patch would be a bad idea. Is there a good way to test for that? I suppose we could set up a deliberately-slow DNS server and try to call getaddrinfo with AI_NUMERICHOST while another thread is blocked talking to that server, but that seems like a lot of test infrastructure to build out.

    5c0d0cd9-bbdc-4a00-8830-a38a1efc5f8a commented 2 years ago

    On MacOS in 2015, getaddrinfo was found to be much slower than inet_pton. Unless that's changed, this patch would be a performance regression on that platform. Data and benchmark script in https://groups.google.com/g/python-tulip/c/-SFI8kkQEj4/m/m1-oCMSABgAJ

    graingert commented 2 years ago

    running the benchmark script https://gist.github.com/graingert/5097c6be997ab2a201b48b858524e163 on my Linux machine I get

    before:

    /home/graingert/projects/cpython-main/../demo/demo.py:28: DeprecationWarning: There is no current event loop
      loop = get_event_loop()
          host       port     family       type      proto    secs
       1.2.3.4          1          2          1          6     9.4
       1.2.3.4          1          0          1          6     9.7
       1.2.3.4          1          0          2         17    10.9
       1.2.3.4          1          0          1          0    14.5
       1.2.3.4          1          0          2          0    15.9
       1.2.3.4          1          0          0          0    15.8
           ::3          1         10          1          6    13.0
           ::3          1          0          1          6    13.3
    Traceback (most recent call last):
      File "/home/graingert/projects/cpython-main/../demo/demo.py", line 58, in <module>
        lookup(*info)
        ^^^^^^^^^^^^^
      File "/home/graingert/projects/cpython-main/../demo/demo.py", line 32, in lookup
        return loop.run_until_complete(loop.getaddrinfo(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/graingert/projects/cpython-main/Lib/asyncio/base_events.py", line 650, in run_until_complete
        return future.result()
               ^^^^^^^^^^^^^^^
      File "/home/graingert/projects/cpython-main/Lib/asyncio/base_events.py", line 864, in getaddrinfo
        return await self.run_in_executor(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/graingert/projects/cpython-main/Lib/concurrent/futures/thread.py", line 58, in run
        result = self.fn(*self.args, **self.kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/graingert/projects/cpython-main/Lib/socket.py", line 961, in getaddrinfo
        for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    socket.gaierror: [Errno -2] Name or service not known

    after:

    /home/graingert/projects/cpython/../demo/demo.py:28: DeprecationWarning: There is no current event loop
      loop = get_event_loop()
          host       port     family       type      proto    secs
       1.2.3.4          1          2          1          6     2.4
       1.2.3.4          1          0          1          6     2.4
       1.2.3.4          1          0          2         17     2.4
       1.2.3.4          1          0          1          0     2.4
       1.2.3.4          1          0          2          0     2.3
       1.2.3.4          1          0          0          0     2.7
           ::3          1         10          1          6     2.3
           ::3          1          0          1          6     2.3
    Traceback (most recent call last):
      File "/home/graingert/projects/cpython/../demo/demo.py", line 58, in <module>
        lookup(*info)
        ^^^^^^^^^^^^^
      File "/home/graingert/projects/cpython/../demo/demo.py", line 32, in lookup
        return loop.run_until_complete(loop.getaddrinfo(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/graingert/projects/cpython/Lib/asyncio/base_events.py", line 591, in run_until_complete
        return future.result()
               ^^^^^^^^^^^^^^^
      File "/home/graingert/projects/cpython/Lib/asyncio/base_events.py", line 814, in getaddrinfo
        return await self.run_in_executor(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/graingert/projects/cpython/Lib/concurrent/futures/thread.py", line 58, in run
        result = self.fn(*self.args, **self.kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/graingert/projects/cpython/Lib/socket.py", line 962, in getaddrinfo
        for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    socket.gaierror: [Errno -2] Name or service not known
    graingert commented 2 years ago

    on macos:

    before https://github.com/graingert/cpython/runs/6528334512?check_suite_focus=true#step:7:1

    Run ./python.exe demo.py
    /Users/runner/work/cpython/cpython/demo.py:28: DeprecationWarning: There is no current event loop
      loop = get_event_loop()
          host       port     family       type      proto    secs
       1.2.3.4          1          2          1          6    54.6
       1.2.3.4          1          0          1          6    55.0
       1.2.3.4          1          0          2         17    52.8
       1.2.3.4          1          0          1          0    52.0
       1.2.3.4          1          0          2          0    51.2
       1.2.3.4          1          0          0          0    54.7
           ::3          1         30          1          6    52.6
           ::3          1          0          1          6    52.6
       ::3%lo0          1         30          1          6    55.5

    after https://github.com/graingert/cpython/runs/6528182360?check_suite_focus=true#step:7:11

    Run ./python.exe demo.py
    /Users/runner/work/cpython/cpython/demo.py:28: DeprecationWarning: There is no current event loop
      loop = get_event_loop()
          host       port     family       type      proto    secs
       1.2.3.4          1          2          1          6    21.3
       1.2.3.4          1          0          1          6    21.1
       1.2.3.4          1          0          2         17    21.1
       1.2.3.4          1          0          1          0    21.1
       1.2.3.4          1          0          2          0    21.0
       1.2.3.4          1          0          0          0    22.1
           ::3          1         30          1          6    20.9
           ::3          1          0          1          6    20.9
       ::3%lo0          1         30          1          6    23.8
    graingert commented 2 years ago

    https://bugzilla.mozilla.org/show_bug.cgi?id=404399#c16 and https://bugzilla.mozilla.org/show_bug.cgi?id=344809#c21 seem relevant

    graingert commented 2 years ago

    benchmarking _ipaddr_info https://github.com/graingert/cpython/commit/dee834cb3c05c1b57948c961700ce3c5ade102d3 seems to show that it's about the same speed as getaddrinfo with AI_NUMERICHOST:

    https://github.com/graingert/cpython/runs/6528895181?check_suite_focus=true

    Run ./python.exe demo.py
    /Users/runner/work/cpython/cpython/demo.py:28: DeprecationWarning: There is no current event loop
      loop = get_event_loop()
          host       port     family       type      proto    secs
       1.2.3.4          1          2          1          6    19.2
       1.2.3.4          1          0          1          6    19.1
       1.2.3.4          1          0          2         17    19.2
       1.2.3.4          1          0          1          0    19.0
       1.2.3.4          1          0          2          0    18.9
       1.2.3.4          1          0          0          0    18.1
           ::3          1         30          1          6    19.0
           ::3          1          0          1          6    20.2
       ::3%lo0          1         30          1          6    19.1

    @bdarnell

    willingc commented 4 months ago

    Next action: Review the existing PR and determine next steps.