Open benjaminp opened 12 years ago
HTTPConnection.send() should use os.sendfile when possible to avoid copying data into userspace and back.
This is not possible for two reasons:
on most POSIX systems, sendfile() works with mmap-like ("regular") files only, while HTTPConnection.send() accepts any file-like object as long as it provides a read() method
after read()ing a chunk of data from the file and before send()ing it over the socket, the data can be subject to an intermediate conversion (datablock.encode("iso-8859-1")): http://hg.python.org/cpython/file/87c6be1e393a/Lib/http/client.py#l839 ...whereas sendfile() can only be used to send a binary file "as-is"
I think we can use sendfile() in ftplib.py though . I'll open a ticket for that.
2011/12/8 Giampaolo Rodola' \report@bugs.python.org\:
Giampaolo Rodola' \g.rodola@gmail.com\ added the comment:
This is not possible for two reasons:
on most POSIX systems, sendfile() works with mmap-like ("regular") files only, while HTTPConnection.send() accepts any file-like object as long as it provides a read() method
after read()ing a chunk of data from the file and before send()ing it over the socket, the data can be subject to an intermediate conversion (datablock.encode("iso-8859-1")): http://hg.python.org/cpython/file/87c6be1e393a/Lib/http/client.py#l839 ...whereas sendfile() can only be used to send a binary file "as-is"
I presume you could check for a binary mode, though? Also, you can catch EINVAl on invalid fds.
ftplib's sendfile support is not tracked as bpo-13559. Considerations I made there should apply here as well.
Ops! I meant bpo-13564.
Patch in attachment uses the newly added socket.sendfile() method (bpo-17552).
The multiple personalities of HTTPConnection.send() and friends is a bit of a can of worms. I suggest working on bpo-23740 to get an idea of what kinds of file objects are meant to be supported, and what things may work by accident and be used in the real world.
For instance, is it possible to manually set Content-Length, and say supply a GzipFile reader, or file object positioned halfway through the file? How does this interact with the socket.sendfile() call?
This issue is no newcomer friendly, I remove the "easy" keyword.
I would like to take a stab at this. Giampaolo, would it be okay if I made a pull request updated from your patch? With the appropriate "Co-authored-by: Author Name \<email_address>" line.
Alex, https://bugs.python.org/issue23740 is identified as a dependency on this issue. We will have to resolve that first, and come back to this. And yes, if you contribute on other's patch, both the contributions will be included and appropriately credited.
To check my understanding
Is the motivation for the closer to
My guess is 5.
Yes, the point number 5. We will have to evaluate if sendfile side-steps and avoids the issues noted in bpo-23740
sendfile() only works for plain HTTP. For technical reasons it does not work for HTTPS (*). These days majority of services use HTTPS. Therefore the usefulness of sendfile() patch is minimal.
(*) It is possible to use sendfile() for TLS connections, but the feature requires a Kernel module that provides kTLS offloading feature, https://www.kernel.org/doc/html/latest/networking/tls-offload.html . In user space it requires OpenSSL 3.0.0 with kTLS support. 3.0.0 is currently under development.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['library', '3.10', 'performance']
title = 'Use sendfile where possible in httplib'
updated_at =
user = 'https://github.com/benjaminp'
```
bugs.python.org fields:
```python
activity =
actor = 'christian.heimes'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation =
creator = 'benjamin.peterson'
dependencies = ['17552', '23740']
files = ['35569']
hgrepos = []
issue_num = 13559
keywords = ['patch']
message_count = 13.0
messages = ['149052', '149073', '149075', '149077', '149078', '220262', '250435', '348634', '387699', '387729', '387751', '387754', '388346']
nosy_count = 9.0
nosy_names = ['orsenthil', 'giampaolo.rodola', 'christian.heimes', 'benjamin.peterson', 'eric.araujo', 'rosslagerwall', 'kasun', 'martin.panter', 'Alex.Willmer']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue13559'
versions = ['Python 3.10']
```