Header "Transfer-Encoding: chunked" set even if Content-Length is provided which causes body to not actually get chunked

ysangkok commented 11 years ago

Test script

import requests
import time

def f():
    yield b"lol"
    time.sleep(2)
    yield b"man"

requests.post('http://127.0.0.1:8801/', data=f(), headers={"Content-Length": 6})

Actual result

Received on the server:

$ nc -p 8801 -l
POST / HTTP/1.1
Host: 127.0.0.1:8801
User-Agent: python-requests/2.0.0 CPython/3.3.1 Linux/3.11.0-031100rc4-generic
Accept: */*
Transfer-Encoding: chunked
Content-Length: 6
Accept-Encoding: gzip, deflate, compress

lolman

Expected result

Did not expect "Transfer-Encoding: chunked" since I provided the Content-Length. If requests insists on doing chunked transfer encoding, it should disregard the content length and actually chunk the content (as it does if there is not Content-Length header given).

Lukasa commented 11 years ago

I agree, we should pick one of those two options. =)

sigmavirus24 commented 11 years ago

So commenting here as I should have done >6 hours ago:

If requests insists on doing chunked transfer encoding, it should disregard the content length and actually chunk the content (as it does if there is not Content-Length header given).

You're implying that in this instance we didn't actually chunk the data. Do you have actual evidence of that? The output alone is not nearly enough to demonstrate that.

Furthermore running your example above in 2.0.0 does not work: I get this traceback:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/icordasc/virtualenv/vcr/lib/python2.7/site-packages/requests/api.py", line 88, in post
    return request('post', url, data=data, **kwargs)
  File "/Users/icordasc/virtualenv/vcr/lib/python2.7/site-packages/requests/api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Users/icordasc/virtualenv/vcr/lib/python2.7/site-packages/requests/sessions.py", line 357, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/icordasc/virtualenv/vcr/lib/python2.7/site-packages/requests/sessions.py", line 460, in send
    r = adapter.send(request, **kwargs)
  File "/Users/icordasc/virtualenv/vcr/lib/python2.7/site-packages/requests/adapters.py", line 319, in send
    timeout=timeout
  File "/Users/icordasc/virtualenv/vcr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 541, in urlopen
    body=body, headers=headers)
  File "/Users/icordasc/virtualenv/vcr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 366, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 955, in request
    self._send_request(method, url, body, headers)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 989, in _send_request
    self.endheaders(body)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 951, in endheaders
    self._send_output(message_body)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 815, in _send_output
    self.send(message_body)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 787, in send
    self.sock.sendall(data)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
TypeError: must be string or buffer, not generator

This isn't even an actual issue as far as I can tell. Not at least on 2.0.0 which is the only currently supported version of requests around.

Did not expect "Transfer-Encoding: chunked" since I provided the Content-Length

What gave you the impression that this would be the case?

ysangkok commented 11 years ago

What gave me the impression was the fact that Requests did not chunk (even if it set the header) once I set Content-Length.

The example works with Python 3.3.

sigmavirus24 commented 11 years ago

Trying it on Python 3.3 it does not blow up, on the other hand it did chunk the content. The chunking is not platform dependent so I am thoroughly confused by that.

POST / HTTP/1.1
Host: 127.0.0.1:8801
Content-Length: 6
User-Agent: python-requests/1.2.3 CPython/3.3.2 Darwin/12.5.0
Transfer-Encoding: chunked
Accept: */*
Accept-Encoding: gzip, deflate, compress

3
lol
3
man
0

Secondly, I'd still like to know what gave you the impression that setting the Content-Length header would prevent chunked encoding of data.

ysangkok commented 11 years ago

@sigmavirus24 You're not using Requests 2.0.0 on your Python 3.3 installation.

sigmavirus24 commented 11 years ago

@ysangkok good point. :-) With 2.0 it's definitely broken. Regardless my opinion is definitely in the court of us removing the Content-Length header.

Lukasa commented 11 years ago

You're probably right @sigmavirus24, we should probably strip the Content-Length header.

sigmavirus24 commented 11 years ago

Hm, I'm not entirely convinced of my own argument anymore. I went looking for the old discussions surrounding users setting the Content-Length header themselves and I must be remembering old IRC conversations.

I'm still of the opinion that users should really not be setting those headers themselves and that we should remove them, however, I think this issue points out a far more important issue, which is the vastly different behaviour of requests on two different versions of Python.

On 2.7 (as I demonstrated) setting the Content-Length header does nothing to change how requests uploads the data. On 3.3, however, @ysangkok is correct that setting it sends everything as soon as it can (it still uses the generator but does not send it in an actually chunked manor).

One easy way to fix this is to remove the header when using a generator (or always to provide a consistent behaviour), the flaw with this is that this is backwards incompatible behaviour.

The other easy way is to break the consistency of the API by not always using chunked transfer encoding with a generator. @Lukasa this definitely needs some deeper thought since I can't find the old conversations where users were admonished for setting that header themselves.

To be honest though, I would never expect that setting a header would change the behaviour of using a generator though.

This certainly is an extremely sticky situation

Lukasa commented 11 years ago

Bleh. I'll ponder.

sigmavirus24 commented 11 years ago

Also this isn't a breaking API change that I would hesitate to make because I have to wonder how many people are actually relying on this behaviour.

bryanhelmig commented 10 years ago

I've run into a situation where the length of a generator (for example, a streamed file from another server or really large disk on file) is known. For example:

response = requests.get('http://example.com/mybig.iso', stream=True)
length = response.headers.get('Content-Length')

def exhaust(response):
    while True:
        out = response.raw.read(1024*1024)
        if not out:
            break
        yield out

response = requests.post('http://example.com/upload', data=exhaust(response), headers={'Content-Length': length})

That might be a valid reason to keep Content-Length around. The only workaround right now (that I know of) is to exhaust the generator in memory and pass the bytes directly in.

sigmavirus24 commented 10 years ago

@bryanhelmig does the server you're uploading to require you to send the Content-Length header?

ysangkok commented 10 years ago

@bryanhelmig did you see the comments in the linked pull request?

Anyway, I don't understand why Content-Transfer-Encoding isn't just a flag. No need to delete any headers (or do any other kind of hand-holding), it was never a question of the Content-Length being right or wrong, the actual issue is that the presence of Content-Length semi-disables Content-Transfer-Encoding, which makes no sense at all. But just making Requests ignore Content-Length is not solving the real problem, which is, that Requests uses Content-Transfer-Encoding when it feels like it (sounds like that is supposed to be when reading from a generator), even though many web servers don't even support it.

Ignoring Content-Length will confuse people who supply it. If you (@sigmavirus24) insist on not transmitting it, why not just throw an exception? As you said, this functionality is probably not used widely.

In the pull request, you said "The use case in the issue is just an example of the behaviour. There is no justification there for why you're doing what you're doing.". I disagree, I think the original code in this issue is perfectly normal behaviour, and in fact I think that streaming of POST data is a huge use case, and that it's ridiculous if one is forced to use Content-Transfer-Encoding or resort to lower-level libraries when streaming/using generators.

So to summarize: Content-Transfer-Encoding should be a flag, illegal parameter combinations should provoke exceptions, and user-supplied flags should be sent when possible. And of course, it shouldn't be possible to semi-disable Content-Transfer-Encoding.

Lukasa commented 10 years ago

Stop stop stop.

Everyone take a breather.

@ysangkok You can do streaming uploads without generators just fine. Provide Requests a file-like object in the data parameter and that will work. Yes, it's not as simple as using a generator, but that's ok because it's still not very hard.

In the meantime, Requests should not suggest that it is chunking data when it does not. We are all agreed on that. The question is what we should do in your specific case: namely, providing a generator and a Content-Length. You and @sigmavirus24 legitimately disagree on this issue, which is fine. However, can we all please acknowledge that both camps have rational reasons to expect their position?

@ysangkok You've said that "Ignoring Content-Length will confuse people who supply it." @sigmavirus24 contends that ignoring the very clear documentation when provided with a generator will confuse people who do that. You are both right.

(As a side note, the fact that many servers don't understand Transfer-Encoding is just a wild assertion based on no evidence that I've seen. Until any evidence is provided, I'm choosing to ignore it.)

One way or another we're going to have to pick what we do here. It's possible that the correct decision is to throw an exception when both a generator and Content-Length are provided. That's viable. It doesn't even make @bryanhelmig's case worse, because he should just be passing response.raw straight through rather than wrapping it in a decorator (for your benefit, Bryan).

I'm naturally inclined to sit on the fence here and throw a YoureACrazyPerson exception, but I can see why both of you believe what you believe. In particular, making decisions based on user-supplied headers is lame and confusing, and we should try not to do that. The following, however, are hard lines:

Controlling Transfer-Encoding will not be a flag. Not now, not ever. Requests does not do tiny special case flags like this.
We cannot do nothing.
Requests is not obliged to support all use cases. I will happily throw either use case under a bus if it makes the API better.

bryanhelmig commented 10 years ago

does the server you're uploading to require you to send the Content-Length header?

@sigmavirus24 it does. :-( 411 for attempts without a Content-Length. Quite annoying IMO.

did you see the comments in the linked pull request?

@ysangkok I did.

he should just be passing response.raw straight through rather than wrapping it in a decorator

@Lukasa That was my original edit actually, but I'm not sure what that nets us in this case besides a chance to link to the docs. Unless I am mistaken, a file object isn't guaranteed to give you a length, generators are just guaranteed to not give a length. I noticed in testing that small files didn't get chucked, a generator forced it.

Thanks for all the thorough replies everyone. Really though, it is fine if users in exotic situations have to swap to a different lib for one out of a 100 requests. Life is much easier for the 99 other cases.

piotr-dobrogost commented 10 years ago

Really though, it is fine if users in exotic situations have to swap to a different lib for one out of a 100 requests.

I'd prefer Requests Make Easy Things Easy & Hard Things Possible :)

Lukasa commented 10 years ago

@piotr-dobrogost Agreed, but if making a hard thing possible requires making an easy thing harder, we'd rather leave the easy thing easy. =)

sigmavirus24 commented 10 years ago

A few things:

As far as I'm concerned, this has been (and will continue to be, until we come to a decision) undefined behaviour

As a side note, the fact that many servers don't understand Transfer-Encoding is just a wild assertion based on no evidence that I've seen. Until any evidence is provided, I'm choosing to ignore it.

There's a difference between servers not understanding a chunked Transfer-Encoding and servers not wanting to respect it. I suspect the latter is the case in @bryanhelmig's case. Alternatively, the application could have been written by someone who doesn't understand or know about Transfer-Encoding and so requires a Content-Length.

It's possible that the correct decision is to throw an exception when both a generator and Content-Length are provided. That's viable.

Exceptions might be too extreme in this case. We don't generally raise exceptions for anything other than invalid URLs. The way we process the data and files parameters can raise exceptions but we don't special case anything there. That said, we've been talking about how poor an idea it is when users specify their own Content-Length and Host headers (among others which I'm probably forgetting). Since these aren't technically invalid practices, but rather practices we advise against, I suggest that we instead trigger a warning and then do the right thing in certain well documented situations.

If we receive a Host header we raise a warning but do not delete it.
If we receive a object whose size we can determine and the Content-Length header is provided, we should raise a warning that providing it in such a case they shouldn't do so. I'm not certain if we should override their setting though.
If we receive a generator and a Content-Length header, we should raise a warning and remove the header.

Using warnings is a more gentle way to inform the user what they're doing is not advisable. It also covers the fact that so many of our users do not seem to bother to read the documentation and so the library becomes self-documenting of sorts. This also gives us the ability to change the behaviour in the future. And even better, if users want to disable the warnings, they can because python provides a way of silencing warnings.

bryanhelmig commented 10 years ago

There's a difference between servers not understanding a chunked Transfer-Encoding and servers not wanting to respect it. I suspect the latter is the case in @bryanhelmig's case. Alternatively, the application could have been written by someone who doesn't understand or know about Transfer-Encoding and so requires a Content-Length.

I actually think this is very likely the case in most of my examples. We (@zapier) attempt to upload files to over a dozen different APIs and the few of them that require Content-Length (seem to) timeout with chunked Transfer-Encoding.

As a side note, the fact that many servers don't understand Transfer-Encoding is just a wild assertion based on no evidence that I've seen. Until any evidence is provided, I'm choosing to ignore it.

I could put together a test suite of how various services respond to Content-Length/Transfer-Encoding, but I feel like even if it is incorrectly implemented APIs, that shouldn't really advise the design of python-requests. Easier still, I could just name names based on my experience fighting this for the last week, but again, if they are API/server bugs, what is the use of such information to python-requests?

I suggest that we instead trigger a warning and then do the right thing in certain well documented situations.

Agree on default behavior, but sometimes reality trumps the "right thing" (especially when you have no control over a potentially broken server). It might be nice to document a technique to override (even if it advocates for the user to do a lot of work, like writing a custom Adapter).

sigmavirus24 commented 10 years ago

Just to clarify: by "the right thing" I do not necessarily mean the RFC thing. (In case that's what you think I meant)

I could put together a test suite of how various services respond to Content-Length/Transfer-Encoding, but I feel like even if it is incorrectly implemented APIs, that shouldn't really advise the design of python-requests.

I agree that misbehaving servers shouldn't inform our design.

Easier still, I could just name names based on my experience fighting this for the last week, but again, if they are API/server bugs, what is the use of such information to python-requests?

We already know how broken the web is. Regardless, knowing whether we should delete the header after warning the user or leave it would be where that data could be useful.

Lukasa commented 10 years ago

Bleh. This makes me sad.

Ok, I think @sigmavirus24's plan is the best one here, at least for now.

bryanhelmig commented 10 years ago

A few things that came up are distracting me from getting you guys some detailed data, but here is a brain dump of what I've seen:

The biggest offender I've seen are API endpoints that require multipart upload: they usually need a Content-Length or they freak out on Transfer-Encoding chunked (I'm not sure which). That isn't the case everywhere, but here are the bits I can pull out of our source with a quick grep:

Twitter does multipart and fails with chunked encoding/no length.
Podio does multipart and fails with chunked encoding/no length.
Facebook does multipart and works with chunked encoding/no length.
Salesforce does multipart and works with chunked encoding/no length.

The reason this is fresh in my mind is we've built a custom multipart body generator that works with "lazy" files that can also calculate the length, this happened to bubble up a lot of the issues we're talking about here. Right now we're just cheating and doing getvalue() for the failing upload endpoints. We'll likely revisit it someday.

The other examples are harder to grep for, but the file companies (Box, Dropbox, SugarSync, etc...) all get it and work perfectly fine with chunked encoding, so no worries there.

Hopefully this shines a little more light on our real world use case. I wish I could get you more information to confirm which headers commonly cause which errors (they are usually timeouts).

sigmavirus24 commented 10 years ago

This should be easier to test now that we have a few APIs that we can reproduce against, namely Twitter.

netheosgithub commented 10 years ago

I have the same exact use case : uploading data to a server that does not support chunked encoding. Data length is known in advance, but does not come from a file (from a generator). I expected that setting a content length header would disable length calculation in requests, and also disable chunked transfer encoding. By now I manage to work-around this problem by adding an additional 'len' attribute to my generator object, so that requests utils.super_len() returns something so chunked encoding is not chosen by requests : this is ugly and brillant. As a unix user (and like @piotr-dobrogost ), I would expect that programmer knows what he does, and library should obey : providing a content-length header is a clear indication that chunked encoding is not to be used. But this conflicts with your sentence above : "I would never expect that setting a header would change the behaviour of using a generator though". Well, if is clearly documented I don't see the point. It wouln't break API, would it ?

Lukasa commented 10 years ago

@netheosgithub I'd argue that changing this behaviour absolutely constitutes a change in the API. Consider the current state of the API, from the documentation:

Requests also supports Chunked transfer encoding for outgoing and incoming requests. To send a chunk-encoded request, simply provide a generator (or any iterator without a length) for your body.

Note that the documentation doesn't say "To send a chunk-encoded request, simply provide a generator and do not set the Content-Length header." This makes this an API change in my book. Not just any API change: a bad one. The idea that setting a header changes the body of the request makes me feel deeply uncomfortable. Consider if someone asked us for a similar request whereby if they set the header Content-Type: application/json, we should JSON-encode the data parameter instead of form-encoding it! I'd throw that request out in a heartbeat.

I think we should try to address the root of the problem: why are people using generators in this situation?

netheosgithub commented 10 years ago

Currently, providing a generator and specifying a content-length is an error (it generates an invalid http request), so this use case should not be used by anybody. That's why I was thinking it would not break your user's programs. Why generators ? Data is not always provided by a file like object. For example, I'd like to watch upload progress by yielding data chunk by chunk, etc. (otherwise I'd have to override read() methods)

Lukasa commented 10 years ago

If only your 'invalid HTTP request' argument was valid. I wish I lived in that world. However, many many servers will surely happily accept such a combination.

RFC 2616 provides this illuminating section on determining the message length:

The transfer-length of a message is the length of the message-body as it appears in the message; that is, after any transfer-codings have been applied. When a message-body is included with a message, the transfer-length of that body is determined by one of the following (in order of precedence):

Any response message which "MUST NOT" include a message-body (such as the 1xx, 204, and 304 responses and any response to a HEAD request) is always terminated by the first empty line after the header fields, regardless of the entity-header fields present in the message.

If a Transfer-Encoding header field (Transfer-Encoding) is present and has any value other than "identity", then the transfer-length is defined by use of the "chunked" transfer-coding (Transfer Codings), unless the message is terminated by closing the connection.

If a Content-Length header field (Content-Length) is present, its decimal value in OCTETs represents both the entity-length and the transfer-length. The Content-Length header field MUST NOT be sent if these two lengths are different (i.e., if a Transfer-Encoding header field is present). If a message is received with both a Transfer-Encoding header field and a Content-Length header field, the latter MUST be ignored. ... Messages MUST NOT include both a Content-Length header field and a non-identity transfer-coding. If the message does include a non-identity transfer-coding, the Content-Length MUST be ignored.

To clarify: we are in agreement that Requests is doing the wrong thing. However, we are not in agreement that setting a Content-Length header should turn chunking off. That's the bit that is causing contention.

Lukasa commented 10 years ago

I should also point out that any server that fails with chunked encoding is also in violation of RFC 2616:

All HTTP/1.1 applications MUST be able to receive and decode the "chunked" transfer-coding, and MUST ignore chunk-extension extensions they do not understand.

This is not an attempt to say that Requests shouldn't fix this problem, it's simply to point out that everyone involved is violating RFC 2616.

bryanhelmig commented 10 years ago

I think we should try to address the root of the problem: why are people using generators in this situation?

One of our use cases was taking a possibly very large file with a know size from one service and uploading it to a second service, except the binary data was simply one of the parts of a multipart POST (with JSON metadata as the other) that the second service required.

So we wrapped the multipart builder in a generator; one we could know the size of. It seemed like a very obvious and useful solution, at least until this requests feature/bug stopped us (I believe we maintain a fork now, but the superlen trick might help undo that!).

We now have other use cases of generators with lens, but I'd have to dig them out.

Bryan

On Feb 18, 2014, at 12:45 AM, Cory Benfield notifications@github.com wrote:

I think we should try to address the root of the problem: why are people using generators in this situation?

Lukasa commented 10 years ago

@bryanhelmig Mm, yes. I think the correct solution there is from now on to use the requests toolbelt's streaming multipart encoder.

Right now though, it's unclear to me why there is such strong resistance to using file-like objects here.

gholms commented 10 years ago

In issue #1895 I encountered this issue even with a file-like object. The request was a PUT, and the file-like object happened to be sys.stdin, which is in fact so file-like that it triggered chunked encoding like it does for a regular file. Providing a Content-Length (which I learned via outside means) in this situation caused HTTPAdapter.send to not do chunking like one might expect, but PreparedRequest.prepare_body still noticed that the file-like object was iterable and added a Transfer-Encoding header anyway. The server (Amazon S3) choked on that header even though the request would have otherwise worked.

Lukasa commented 10 years ago

@gholms Nope. sys.stdin is so not file-like it triggered chunked encoding. If you pass us a file object we will stream it, not chunk it. The problem with sys.stdin is that it has no length, which falls under the description provided in the docs as mentioned above:

any iterator without length

The general expectation is that users will never explicitly set the Content-Length header. This is because Requests may end up messing with how the request body is set up.

Finally, the idea that 'one might expect' that providing a Content-Length header will disable chunking isn't true either: I'd expect us to remove the header. =)

Lukasa commented 10 years ago

Anyway, this discussion has gone on long enough. When provided with a lengthless iterator and a user-supplied Content-Length header, we have these options:

Raise an exception.
Blow away the Content-Length header
Don't chunk.

Any of these three options brings us into compliance with the RFC. It is clear that everyone raising this issue prefers (3). @sigmavirus24?

sigmavirus24 commented 10 years ago

I've always been of the opinion that we should blow away the header. In my experience the user thinks they know what they're doing but rarely know enough to actually know what they're doing. I generally like to let the user shoot themselves in the foot and we've done that until now and it hasn't gotten us much. In fact, we had the opportunity to blow away that header before and specifically didn't do that. That has lead to exactly this issue. Let's be logical about this:

Our API and documentation has always maintained that passing an iterable with no way of measuring length means we will use chunked transfer encoding. By the spec in this case we should be blowing away the Content-Length header because some servers do not obey it. It should be ignored. We're not doing anything wrong by sending it regardless of the encoding, the server is doing the wrong thing by not ignoring it. To protect against bad servers like this we should be deleting it.

If in 3.0.0 we decide to change the API such that providing the header does not chunk that's a different beast. We're not considering 3.0.0 though and that's the real issue here. What can we do to solve this issue now. What we can do is follow the spec.

Lukasa commented 10 years ago

I'm inclined to agree with you @sigmavirus24. I'll see if I can get a few minutes with Kenneth at some point soon to talk this over with him.

jbaiter commented 10 years ago

I'm in the same boat as @bryanhelmig and @netheosgithub, I have a generator where I know in advance what size the combined chunks will have, and have a server that does not support chunked uploads (a WSGI app, WSGI according to my research does not support chunked encoding at all). The data from the generator is too big to fit into RAM, so combining the chunks before and passing it to requests is out of the question. Has there been any new development regarding this issue?

sigmavirus24 commented 10 years ago

@jbaiter then what you need is to pass an object that behaves like a file with a __len__ defined. Take, for example, the toolbelt's MultipartEncoder. You want streaming, so in general all you need is something like this which has a read method and a way of determining its length (preferably by implementing __len__). Your object's API could look something like:

class JBaiterStreamer(object):
    def __init__(self, size, iterator):
        self.size = size
        self.iterator = iterator

    def __len__(self):
        return self.size

    def read(self, *args):
        try:
            return next(self.iterator)
        except StopIteration:
            return b''

Granted I haven't tried to see if that code will work, but something along those lines should.

jbaiter commented 10 years ago

Thanks @sigmavirus24 :+1:, I did precisely that, i was just wondering if by now there was a more elegant way to go about it

sigmavirus24 commented 10 years ago

Perhaps there's a good way of providing this via the toolbelt, eh @Lukasa ?

Lukasa commented 10 years ago

@sigmavirus24 Absolutely. =)

sigmavirus24 commented 10 years ago

@Lukasa any feedback on https://gitlab.com/sigmavirus24/toolbelt/merge_requests/2

ztane commented 8 years ago

@sigmavirus24 "We're not doing anything wrong by sending it regardless of the encoding, the server is doing the wrong thing by not ignoring it." is actually now wrong:

RFC 7230: "A sender MUST NOT send a Content-Length header field in any message that contains a Transfer-Encoding header field."

sigmavirus24 commented 8 years ago

@ztane this is a different case than what that stack overflow question is referring to. In the future, please closely read a discussion before generating notifications for all of its participants.

ztane commented 8 years ago

@sigmavirus24 that was caused by "transfer-Encoding: chunked, content-length set => body not chunked." And RFC 7230 forbids setting Content-Length with Transfer-Encoding.

sigmavirus24 commented 8 years ago

@ztane thank you for continuing to not read this discussion. This particular bug is about someone setting a Content-Length header by hand and expecting requests to not use a chunked upload when they provide a generator (which always triggers a chunked upload with requests). The stackoverflow link you provided and what you're talking about is a different bug that is being handled elsewhere. You're muddying this discussion with irrelevant details because we allow users to shoot themselves in the feet. For example, we allow users to specify an incorrect content-length header or invalid host header, or basically whatever they please. We don't police everything nor will we. Please focus your conversation on the correct issue in the future.

timuralp commented 8 years ago

I stumbled into this when using boto3 to do a PUT from a stream against AWS S3 (https://github.com/boto/botocore/issues/911). In my case, the size of the stream is known -- it is an object from another provider. When using S3 version 2 signatures, the "Transfer-Encoding: chunked" is not supported and S3 returns the 501 error. The boto3 documentation seems to imply that setting the Content-Length would get around this issue, as they state the following:

ContentLength (integer) -- Size of the body in bytes. This parameter is useful when
the size of the body cannot be determined automatically. [1]

While boto3 should do two things -- fix their documentation and ensure that "Transfer-Encoding: chunked" is not set -- for the indirect consumer of requests this behavior is hard to debug. If the Content-Length header is ignored, raising an exception would at least make it obvious what is going on. As is, it was obfuscated by the fact that S3 only returns a 501 error with an explanation that one of the headers is not supported (but not specifying the header).

The suggested workaround to wrap it and provide the length works, but seems ugly. Would expanding the API to allow for toggling chunked encoding (while keeping the current behavior as default) be a palatable way forward (as opposed to using the Content-Length header as the flag)?

[1] http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.put_object

Lukasa commented 8 years ago

@timuralp I remain opposed to adding a flag for this. It's really just unacceptable to have a HTTP/1.1 implementation that cannot handle chunked transfer encoding in 2016. It's been a specification requirement for so long that the first specification that required it is nearly old enough to vote in the United States of America: I don't think we can keep cutting entities slack for not doing it.

From my perspective the bug here remains that we may incorrectly emit both Content-Length and Transfer-Encoding. Of course, my perspective is non-binding. ;)

sigmavirus24 commented 8 years ago

Further, if boto3 wants to forcibly override the Content-Length header, they should utilize the prepared request flow so they can remove the Transfer-Encoding header.

timuralp commented 8 years ago

@Lukasa @sigmavirus24 fair enough -- thanks for the prompt reply. I'll continue to look to fix the boto issue in that project.

psf / requests