Open bitdancer opened 9 years ago
While committing the patch for bpo-23539 I decided to rewrite the 'request' docs for clarity. I doing so I found that http.client isn't as consistent as it could be about how it handles bytes and strings. Two points specifically: it will only take the length of a bytes-like object (to supply a default Content-Length header) if isinstance(x, bytes) is true (that is, it doesn't take the length of eg array or memoryview objects), and (2) if an iterable is passed in, it must be an iterable of bytes-like objects. Since it already automatically encodes string objects and text files, for consistency it should probably also encode strings if they are passed in via an iterator.
See also bpo-23350 and bpo-23360.
Summary of the main supported types as I see them, whether documented, undocumented, or only working by accident:
Arbitrary bytes-like objects are not properly supported by SSLSocket.sendall(). After all, the non-SSL socket.sendall() documentation does not explicitly mention supporting arbitrary bytes-like objects either, though it does seem to support them in practice.
Suggested documentation fixes and additions:
I would support encoding iterables of str() objects for consistency. The patch for bpo-23350 already does this, although I am starting to question the wisdom of special-casing lists and tuples in that patch.
As well as encoding iterables of str(), text files could also be handled more consistently by checking the read() return type. That would eliminate the complication of checking for a "b" mode.
FWIW, I've done some additional work to request/send in issue bpo-12319 where I've added support for chunked request encoding.
@David
if an iterable is passed in, it must be an iterable of bytes-like objects
This specific issue is addressed in the patch in bpo-23350.
@Martin
text files could also be handled more consistently by checking the read() return type
I wouldn't be opposed to this at all. In fact, I was going to initially make that change in bpo-12319, but wanted to keep the change surface minimal and realistically, peeking at the data type rather than checking for 'b' in mode doesn't /really/ make that much of a difference.
Yeah, if we're going to check the type for iterables to convert strings, we might as well check the type for read() as well.
The bit about the len not being set except for str and bytes was me mis-remembering what I read in the code. (The isinstance check is about whether _send_output sends the body in the same packet.)
Note that for file-like objects we have also the same issue as bpo-22468. Content-Length is not determined correctly for GzipFile and like.
The priority of file-like objects versus bytes-like and iterable objects should be well defined. See bpo-5038: mmap objects seem to satisfy all three interfaces, but the result may be different depending on the file position.
I’ve decided I would prefer deprecating the Latin-1 encoding, rather than adding more encoding support for iterables and text files. If not deprecating it altogether, at least prefer just ASCII encoding (like Python 2). The urlopen() function already rejects text str, and in bpo-26045 people often want or expect UTF-8.
Here is a summary I made of the different data types handled for the body object, from highest priority to lowest priority.
HTTPConnection HTTPConn. default
body _send_output() Content-Length urlopen()
\=========== =============== ================= \==================
None No body 0 or unset bpo-23539 C-L unset
Has read() read() bpo-1065257 * bpo-5038
Text file encode() bpo-5314
str encode() r56128 len() TypeError bpo-11082
Byte seq. sendall() len() C-L: nbytes
Bytes-like bpo-27340 C-L: nbytes bpo-3243
Iterable iter() bpo-3243 C-L required bpo-3243
Sized len()† bpo-23350
fstat() st_size bpo-1065257
Otherwise TypeError Unset bpo-1065257 C-L optional†
Possible bugs † Degenerate cases not worth supporting IMO C-L = Content-Length header field, set by default or required to be specified in various cases Byte seq. = Sequence of bytes, including “bytes” type, bytearray, array("B"), etc Bytes-like = Any C-contiguous buffer, including zero- and multi-dimensional arrays, and with items other than bytes Iterable = Iterable of bytes or bytes-like objects (depending on bpo-27340 about SSL)
A data point found while I researched this
MyPy typeshed [1] currently declares
_DataType = Union[bytes, IO[Any], Iterable[bytes], str]
class HTTPConnection:
def send(self, data: _DataType) -> None: ...
First stab at characterising http.client.HTTPConnnection.send().
https://github.com/moreati/bpo-23740
This uses a webserver that returns request details, in the body of the response. Raw (TCP level) content is included. It shares a similar purpose to HTTP TRACE command. In principal the bytes that HTTPConection.send() writes will match to the bytes returned (after they're decapsulated from the JSON). I've not tested that aspect yet.
TODO
http_dump.py now covers CPython 3.6-3.10 (via Tox), and HTTPSConnection https://github.com/moreati/bpo-23740
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['easy', 'type-feature']
title = 'http.client request and send method have some datatype issues'
updated_at =
user = 'https://github.com/bitdancer'
```
bugs.python.org fields:
```python
activity =
actor = 'Alex.Willmer'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = []
creation =
creator = 'r.david.murray'
dependencies = ['27340']
files = []
hgrepos = []
issue_num = 23740
keywords = ['easy']
message_count = 12.0
messages = ['238928', '238935', '238952', '238960', '238964', '238969', '239023', '241190', '272118', '387967', '387975', '388165']
nosy_count = 7.0
nosy_names = ['r.david.murray', 'axitkhurana', 'martin.panter', 'serhiy.storchaka', 'demian.brecht', 'Alex.Willmer', 'Mariatta']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'needs patch'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue23740'
versions = ['Python 3.5']
```