Open d4df69dc-a20a-416f-8567-da25821bca0d opened 8 years ago
It looks like, when doing PUT together with a file-like object to send, http.client will try to send the whole file before analysing the response from the server.
If you do the following: $ dd if=/dev/zero of=/tmp/300mb.zero bs=1M count=300
And then run following code in Python 3.4 (tested 3.4.3 on Linux and FreeBSD): import http.client conn = http.client.HTTPSConnection('api.onedrive.com') req = conn.request( method='PUT', url='https://api.onedrive.com/v1.0/drives/me/root:/test.file:/content', body=open("/tmp/300mb.zero", "rb")) resp = conn.getresponse()
You'll notice the hang. After some time, it will aborted with BrokenPipeError: [Errno 32] Broken pipe. If you run the following code within pdb debugger, and interrupt, you'll probably interrupt somewhere within write loop. You can call self.read() and see, that HTTP 413 is waiting to be interpreted.
Doing similar action with curl: $ $ curl -v -T /tmp/300mb.zero https://api.onedrive.com/v1.0/drives/me/root:/test.file:/content
Gives you immediate HTTP 413 error. Can we have the same behaviour in http.client library?
Perhaps by doing non-blocking reads between the writes[*]? I suppose it is possible, but it might complicate the code considerably.
[*] or re-write it using asyncio, but that is definitely out of scope :)
Maybe something like this? Doesn't look too complicated and I haven't noticed any breakage yet.
There is a test suite which can be run to test for breakages:
./python -m test test_httplib
If you do that you'll notice that some things are broken, I got error in 5 different tests related to you select.select call:
TypeError: argument must be an int, or have a fileno() method.
Your test code seems to be running fine now, though.
That was a testing issue, apparently test.test_httplib.FakeSocket is not fake enough.
Here is revised patch. Also covers changes to tests.
All the tests pass now, not sure why your patch doesn't get associated rietveld link, it applies cleanly using hg patch --no-commit
Curl uses “Expect: 100-continue”. This is the relevant output from it:
$ curl -v -T 300mb --no-fail https://api.onedrive.com/v1.0/drives/me/root:/test.file:/content
[. . .]
> PUT /v1.0/drives/me/root:/test.file:/content HTTP/1.1
> Host: api.onedrive.com
> Content-Range: bytes 0-314572799/314572800
> User-Agent: curl/7.43.0
> Accept: */*
> Connection: TE
> TE: gzip
> Content-Length: 314572800
> Expect: 100-continue
>
< HTTP/1.1 413 Request Entity Too Large
< Content-Length: 0
< Server: Microsoft-HTTPAPI/2.0
< P3P: CP="BUS CUR CONo FIN IVDo ONL OUR PHY SAMo TELo"
< X-MSNSERVER: DM2302____PAP179
< Strict-Transport-Security: max-age=31536000; includeSubDomains
< X-ThrowSite: 7c11.e25d
< Date: Sun, 03 Jan 2016 20:04:27 GMT
* HTTP error before end of send, stop sending
<
* Closing connection 0
See bpo-1346874 about support for 100-continue mode.
When I tried your test code on Linux, it hung for two minutes, and then raised ConnectionResetError. I guess the server refused to receive data, and then timed the connection out. The patch should handle this in most cases, but I think there could still a race between select() returning an empty rlist, and then the server sending a request and causing the upload to hang. Also, there is nothing to protect from sendall() partially uploading a chunk, and then hanging.
It should be more robust to check for both send and receive sides of the socket with select(), and only use send(), not sendall(), to avoid blocking.
I would consider this a new feature, so I don’t think it should go into 3.5. It would be good to add documentation and test cases as well.
Wiktor’s problem is similar to the situations in bpo-5542. There, the original poster seemed happy to wait for a broken pipe error, and then inspect a 401 Unauthorized response. This apparently was made to work with plain HTTP over TCP, but not HTTPS.
I wonder if having request() silently return is the best API design. Is there a possibility that the caller would want to distinguish a successful upload from an aborted one? I can’t think of one, but it seems plausible.
Another related enhancement I would like is the ability to detect an error response (e.g. 408 Request Timeout) or dropped connection _before_ even sending a request. This is useful with persistent connections; see items 1 and 2 in \https://bugs.python.org/issue9740#msg240459\.
For non-blocking SSL sockets, the Python documentation suggests the results of select() do not directly indicate if data can be sent or received: \https://docs.python.org/dev/library/ssl.html#ssl-nonblocking\. I guess this would also apply to blocking SSL sockets.
I've checked how it behaves, when I do: $ openssl s_client -host api.onedrive.com -port 443
The server then works like that (using curl debug log style):
PUT /v1.0/drives/me/root:/test.file:/content HTTP/1.1 Host: api.onedrive.com Content-Range: bytes 0-314572799/314572800 User-Agent: curl/7.43.0 Accept: */* Content-Length: 314572800
\< HTTP/1.1 413 Request Entity Too Large \< Content-Length: 0 \< Server: Microsoft-HTTPAPI/2.0 \< P3P: CP="BUS CUR CONo FIN IVDo ONL OUR PHY SAMo TELo" \< X-MSNSERVER: DM2304____PAP130 \< Strict-Transport-Security: max-age=31536000; includeSubDomains \< X-ThrowSite: 7c11.e25d \< Date: Mon, 04 Jan 2016 18:44:28 GMT 0000000000000000000000000000000000000000000000000000 [...]
And I may continue sending the data.
As for 100-continue - it looks like Microsoft server is not supporting Expect: 100-continue header, and it just waits for data, when content is shorter (like 100 bytes).
I do not see clear indications how the client should behave in such situation (response received before sending complete request).
I do not understand, where you see the race condition. As we are sending in blocks of 8kb, if we missed the select on first block, we will catch something in next select. If server would close down the socket immediately, we should receive failure in sock.sendall() -> writing to socket that is closed for writing by remote party.
As I understand, right now my patch breaks uploads, when server returns HTTP 100 response. Contrary to bpo-1346874 I'd recommend to check - if server returned 100 response, then continue the upload of the file or prepare error response for get_response(). I do not see why this behaviour should be implemented in code using http.client.
Status of upload should be (and it is right now) done using get_response() anyway, as any upload request might end in HTTP error response, so I vouch for empty return value - it is complaint with current interface.
I'm in progress of preparation of the test cases for such scenario using SocketServer, to keep it close to how network behaves, but maybe I'll manage to prepare testcases using FakeSocket.
As I see it, if the client receives an early response (when 100-continue is not used) but needs to make more requests, the only choices are:
Complete sending the request if possible, after which the connection may still be usable for a second request. But I’m not sure how many servers would support this; for your Microsoft server this would not be useful.
Abort the connection and start a new one.
Perhaps the race condition would be more obvious if you sent the upload as a single 300 MiB bytes() object rather than a file. Then the sendall() call will be a 300 MiB chunk rather than 8 KiB. If the response is received after sendall() starts, I expect you will see the original two minute delay again. If this were plain TCP, you could replace sendall() with fine-grained select() and send() calls in a loop.
Here is a demonstration of the SSL renegotiation issue. I used separate client and server terminal windows. I don’t know if there is a way to force renegotiation purely in Python’s SSL module, so I used an external Open SSL server.
1: In the server window, start an SSL server on port 4433: $ openssl s_server -cert Lib/test/keycert.pem
2: In the client window, start a request in the Python interpreter. It will pause to read stdin:
>>> import http.client, ssl, sys
>>> context = ssl.SSLContext(ssl.PROTOCOL_SSLv23)
>>> context.load_verify_locations("/usr/lib/python3.5/test/keycert.pem")
>>> conn = http.client.HTTPSConnection('localhost', 4433, context=context)
>>> conn.request("PUT", "/", headers={"Content-Length": "4"}, body=sys.stdin.buffer)
3: In the server, the request header is received. Type lowercase “r” to force a renegotiation: Secure Renegotiation IS supported PUT / HTTP/1.1 Host: localhost:4433 Accept-Encoding: identity Content-Length: 8
r SSL_do_handshake -> 1
4: In Python, type in upload data and repeat Ctrl+D to signal EOF until the request() call stops reading from stdin and you get the Python prompt back: abc \<== 3 letters + newline + Ctrl+D
>>
5: Notice that the no body has been uploaded to the server.
Here are the test cases that I've come up with.
test_response_after_headers - tests for the case that I'm missing test_ssl_renegotiation - tests for the case that Martin point out
As in stock ssl library there is no way to force renegotiation, I've just separated the receiving of standard HTTP status message into few parts and checking, if the http.client is still sending the contents.
So - without my patch, first case is failing, with my patch - the second case is failing. I'll try to find some better solution here, which will cover both cases.
Sorry for making a seemingly easy problem harder :)
Your version of test_ssl_renegotiation() would actually be harder to fix IMO, since that would require the client to receive response data (the “H”) and continue to transmit request data concurrently. It would probably require sending and receiving in separate coroutines or threads, like asyncio.
There may still be a way to avoid that if we don’t need to send and receive HTTP data at the same time. We only need the low-level SSL messages to work properly. The following might work:
self.sock.setblocking(False)
try:
... # Get chunks of request body to send
while datablock:
select((self._reader,), (self.sock,), (), self._timeout)
if self._reader.peek():
... # Early response received
try:
amount = self.sock.send(datablock)
except (BlockingIOError, SSLWantRead, SSLWantWrite):
continue
datablock = datablock[amount:]
finally:
self.sock.settimeout(self.timeout)
This might require the following problems investigated and fixed first:
Need to maintain the buffered socket reader of self.sock.makefile(), before the response is actually read. Could use my patch for bpo-23377.
Allow a buffered socket reader to be non-blocking without upsetting its buffer. The documentation \https://docs.python.org/release/3.4.3/library/socket.html#socket.socket.makefile\ disallows non-blocking mode, but I suspect that is just a Python 2 relic and there is no restriction in the Python 3 implementation. I thought there was a bug open for this but I can only find mentions in other bugs.
Allow using BufferedReader.peek() to see if data is ready to be read. The current documentation does not guarantee it will return any bytes (bpo-5811), but I understand the implementation normally does return ≥ 1 byte.
Clarify what peek() should do if no non-blocking data is available. Currently I think it returns b"", but that is inconsistent with other APIs, and does not differentiate from EOF. I proposed to change peek() to return None in bpo-13322.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['type-bug', 'library']
title = 'http.client PUT method ignores error responses sent immediatly after headers'
updated_at =
user = 'https://bugs.python.org/WiktorNiesiobedzki'
```
bugs.python.org fields:
```python
activity =
actor = 'r.david.murray'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation =
creator = 'Wiktor Niesiobedzki'
dependencies = []
files = ['41390', '41392', '41573']
hgrepos = []
issue_num = 25919
keywords = ['patch']
message_count = 13.0
messages = ['256808', '256809', '256859', '256862', '256863', '256878', '256906', '257430', '257431', '257486', '257508', '257929', '257945']
nosy_count = 3.0
nosy_names = ['SilentGhost', 'martin.panter', 'Wiktor Niesiobedzki']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue25919'
versions = ['Python 3.5', 'Python 3.6']
```