python / cpython

The Python programming language
https://www.python.org
Other
63.37k stars 30.33k forks source link

Use sendfile where possible in httplib #57768

Open benjaminp opened 12 years ago

benjaminp commented 12 years ago
BPO 13559
Nosy @orsenthil, @giampaolo, @tiran, @benjaminp, @merwok, @vadmium, @moreati
Dependencies
  • bpo-17552: Add a new socket.sendfile() method
  • bpo-23740: http.client request and send method have some datatype issues
  • Files
  • httplib-sendfile.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['library', '3.10', 'performance'] title = 'Use sendfile where possible in httplib' updated_at = user = 'https://github.com/benjaminp' ``` bugs.python.org fields: ```python activity = actor = 'christian.heimes' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'benjamin.peterson' dependencies = ['17552', '23740'] files = ['35569'] hgrepos = [] issue_num = 13559 keywords = ['patch'] message_count = 13.0 messages = ['149052', '149073', '149075', '149077', '149078', '220262', '250435', '348634', '387699', '387729', '387751', '387754', '388346'] nosy_count = 9.0 nosy_names = ['orsenthil', 'giampaolo.rodola', 'christian.heimes', 'benjamin.peterson', 'eric.araujo', 'rosslagerwall', 'kasun', 'martin.panter', 'Alex.Willmer'] pr_nums = [] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'performance' url = 'https://bugs.python.org/issue13559' versions = ['Python 3.10'] ```

    benjaminp commented 12 years ago

    HTTPConnection.send() should use os.sendfile when possible to avoid copying data into userspace and back.

    giampaolo commented 12 years ago

    This is not possible for two reasons:

    I think we can use sendfile() in ftplib.py though . I'll open a ticket for that.

    benjaminp commented 12 years ago

    2011/12/8 Giampaolo Rodola' \report@bugs.python.org\:

    Giampaolo Rodola' \g.rodola@gmail.com\ added the comment:

    This is not possible for two reasons:

    • on most POSIX systems, sendfile() works with mmap-like ("regular") files only, while HTTPConnection.send() accepts any file-like object as long as it provides a read() method

    • after read()ing a chunk of data from the file and before send()ing it over the socket, the data can be subject to an intermediate conversion (datablock.encode("iso-8859-1")): http://hg.python.org/cpython/file/87c6be1e393a/Lib/http/client.py#l839 ...whereas sendfile() can only be used to send a binary file "as-is"

    I presume you could check for a binary mode, though? Also, you can catch EINVAl on invalid fds.

    giampaolo commented 12 years ago

    ftplib's sendfile support is not tracked as bpo-13559. Considerations I made there should apply here as well.

    giampaolo commented 12 years ago

    Ops! I meant bpo-13564.

    giampaolo commented 10 years ago

    Patch in attachment uses the newly added socket.sendfile() method (bpo-17552).

    vadmium commented 9 years ago

    The multiple personalities of HTTPConnection.send() and friends is a bit of a can of worms. I suggest working on bpo-23740 to get an idea of what kinds of file objects are meant to be supported, and what things may work by accident and be used in the real world.

    For instance, is it possible to manually set Content-Length, and say supply a GzipFile reader, or file object positioned halfway through the file? How does this interact with the socket.sendfile() call?

    vstinner commented 5 years ago

    This issue is no newcomer friendly, I remove the "easy" keyword.

    16fd8109-37d2-41b2-9312-b984e55c9837 commented 3 years ago

    I would like to take a stab at this. Giampaolo, would it be okay if I made a pull request updated from your patch? With the appropriate "Co-authored-by: Author Name \<email_address>" line.

    orsenthil commented 3 years ago

    Alex, https://bugs.python.org/issue23740 is identified as a dependency on this issue. We will have to resolve that first, and come back to this. And yes, if you contribute on other's patch, both the contributions will be included and appropriately credited.

    16fd8109-37d2-41b2-9312-b984e55c9837 commented 3 years ago

    To check my understanding

    Is the motivation for the closer to

    1. using sendfile() will break $X, and we know X
    2. there's high probability sendfile() will break something
    3. there's unknown probability sendfile() will break something
    4. there's low probability sendfile() will break something, but it is still too high
    5. any non-trivial change here is too risky, regardless of sendfile()
    6. something else?

    My guess is 5.

    orsenthil commented 3 years ago

    Yes, the point number 5. We will have to evaluate if sendfile side-steps and avoids the issues noted in bpo-23740

    tiran commented 3 years ago

    sendfile() only works for plain HTTP. For technical reasons it does not work for HTTPS (*). These days majority of services use HTTPS. Therefore the usefulness of sendfile() patch is minimal.

    (*) It is possible to use sendfile() for TLS connections, but the feature requires a Kernel module that provides kTLS offloading feature, https://www.kernel.org/doc/html/latest/networking/tls-offload.html . In user space it requires OpenSSL 3.0.0 with kTLS support. 3.0.0 is currently under development.