Closed dmckeone closed 9 years ago
If we ever embed a MultiDict
in an object the user can get at, e.g. a PreparedRequest
, this becomes a more significant API change. In particular, right now we enforce the contract that getting headers from any object in Requests will give you a string-to-string dictionary. Changing that is a breaking API change. I think we need to think really strongly about whether that's a good plan.
@Lukasa I took a look through and I believe that my changes to params=
, data=
, and files=
are all safe from an API perspective:
PreparedRequest
does not expose params
, data
or files
Session
has a params
attribute that I initialize to OrderedMultiDict, but it is hidden because it isn't declared in the __attrs__
Request
has params
, data
, and files
exposed, but the user can't get to that because of __attrs__
In the future, I don't think there is any way around an API change for headers
(which is why I shied away from that change), because a dict
, or in this case a CaseInsensitiveDict
, just can't represent multiple keys. So that implementation will require some thinking about the API, and hopefully a structure more dedicated to handling HTTP Headers; maybe just a dict
-like class called HTTPHeader
that has the behaviour of the earlier mentioned CaseInsensitiveOrderedMultiDict
, but is clearly bound to conform to HTTP Header behaviours and rules. After all, CaseInsensitiveDict
is really only used for headers anyway; perhaps it's best to call a spade a spade?
@dmckeone I think either you or I am misunderstanding the role of __attrs__
. As far as I know, it does not hide any information from the user but presents to libraries like multiprocessing
and pickle
what they need to properly serialize and reconstruct those objects. So, yes, in fact params
is entirely exposed to the user. There would be no other point in having it. :)
These were all planned and endorsed changes by @kennethreitz. And to a degree they're quite necessary to make requests entirely functional. Leaving headers
as is, right now, seems okay to me. I'm just not sure how well having the Session.headers
attribute as a MultiDict
will work.
@sigmavirus24 is correct, __attrs__
does not limit exposure of those parameters. In particular, Session.params
, Request.params
, Request.data
and Request.files
are deliberately exposed parts of the API, intended to be used by users.
I think if we're going to make backward-incompatible changes to the API we should do it in one fell swoop. I'm still remembering the less-than-positive initial reaction to turning Response.json
into a method, and this change is far more wide-ranging than that one.
Changing any of these parameters, especially if multi=True
is the default (which it probably should be), is a backward-incompatible change. I'm all for doing it, as I think the current behaviour could be improved, but any implementation needs to very carefully planned, and needs to be clearly documented and accompanied by a decent version number bump.
@sigmavirus24 @Lukasa Thanks guys. Mea culpa about the __attrs__
, you're right.
Changing any of these parameters, especially if multi=True is the default (which it probably should be), is a backward-incompatible change
You may be right (and certainly would be if headers=
was included), but I believe its only backwards incompatible with the current change if you tried to use assume order or duplicate values, and since that wouldn't be possible in older versions due to the use of dict
, I don't think it's incompatible in a breaking way.
I'm thinking of cases like this (where request
is an instance of Request
):
isinstance(request.params, dict)
request.params['param']
for single and iterable values. request.params == {'param': 1}
Have I missed a case here, or is it a reasonable expectation to say that, if you want to use ordered duplicate params
, data
, or files
then 1.2X (or whatever version this ends up in) is your baseline? If you don't use ordered or duplicate params
, data
, or files
then things should work identically for all versions.
So with that in mind, how would you like to proceed with this? Should I be incorporating headers=
into my branch and targeting a full version release with docs and all that, or is the pull request I submitted a way to start that works well enough, and then headers can be approached separately. I'm fine with either, or something else entirely, but since I'm new to this project I'm happy to follow your more experienced lead.
Not a worry @dmckeone . I get confused plenty about stuff too. @Lukasa usually corrects me kindly. :)
I think for the current pull request you can leave out headers
if you're that uncomfortable with it, but it has to be implemented before the next release.
Part of the documentation will have to include a section about how the MultiDict
works with respect to __setattr__
and __getattr__
. That (if I remember correctly) is non-obvious behaviour and in this instance I have no criticism of this aspect of the API. :P
Getting stuff wrong in public is basically the definition of OSS development, as far as I can tell. =P I certainly do it enough.
I think my concern primarily applies to changing headers
. We don't expect param
, data
or files
to be mutated by the library, so the user should get out whatever the hell they put in. However, we mutate headers
quite a bit. If we change what we do there, it's definitely breaking.
I'm with @sigmavirus24 on this: I want to change it, and I think we have to change it, but changing headers is breaking. We might want to consult with @kennethreitz before we go charging ahead on that part of it, to see if he wants to hold off until some later release.
And in case I've seemed a little blunt in this conversation (reading over it again I sure feel like I did), I want to be clear: your work is excellent, and I want to thank you for taking it on. =)
@sigmavirus24 Fully documenting how MultiDict behaves is probably a good idea when we do headers
. As far as headers
goes, I'm not necessarily uncomfortable with the actual work, just the implication of the API change and the associated fallout if it were to be done anything less than perfect. In that way I like @Lukasa's idea to get consensus first, and do it as part of a unique version number.
@Lukasa I actually like direct tone because its clearer, so I didn't mind the bluntness. Thanks for the compliment as well!
So just to summarize what I think I've read here for everyone's sake (there is quite a long comment thread here):
MultiDict support will come in two stages:
1) #1316 for params
, data
, and files
in something relatively soon
2) Full documentation of OrderedMultiDict, as well as a future class that combines the behaviours of OrderedMultiDict with CaseInsensitiveDict, and an implementation of the earlier mentioned future class under the headers
kwarg. All of which happening under a unique version that incorporates an anticipated API change.
Whelp, I just found this thread after commenting a bunch on @dmckeone's open PR, so I'm going to spend some time over the next couple days to try to understand the issues far better before I make additional comments. I mainly wanted to point out that Session.headers
is a CaseInsensitiveDict
now after the #1339 merge, just in case the headers
change is still floating around as an option.
Since there are a number of pull requests in queue that impact this feature (#1321, #1343, #1338), and the discussion around #1316 created some new requirements, I've withdrawn #1316 in favor of a MultiDict
that is derived from Werkzeug's OrderedMultiDict
, but behaves more closely to what requests needs:
items()
that returns a list of 2-tuples, including duplicates, with no multi=True
key-wordmerge_kwargs
that replaces keys, rather than appendingcollections.Mapping
(introduced in #1339)I'll submit this new MultiDict
as a single pull request that encompases params
, data
, and files
and includes a full set of tests for its behaviour. I will not include headers
, since some IRC discussions indicated that changing headers
to MultiDict was high risk, low-reward.
Sorry for the delay in this comment. Things have been busy.
Sorry for the delay in this comment. Things have been busy.
Take your time. @kennethreitz is in transit so there's no rush. Also, I wouldn't blame you if you wanted to wait for those other pull requests to be merged before submitting a new PR. That seems like the most sane thing to do at the moment in my humble opinion.
@sigmavirus24 That is what I will likely do. I will begin work on making the required MultiDict
changes and tests in the next few days and then just rebase everything once those pull requests go in.
@dmckeone there's no need to close this. You can leave it open until your pull is issued & accepted. I don't think any of us mind.
I'm reopening this, just because it provides evidence that this issue is being worked on. Don't want people to get the impression that we gave up or aren't thinking about it. =)
@Lukasa @sigmavirus24 My mistake. I had closed the PR on purpose, but not this. Thanks for re-opening.
Hi ! Got lost and don't know where this has been worked on. Need this feature to move a project from custom HTTP lib to python-requests. There's a PR I can use to test this feature?
@dmckeone Sadly there's been no particular progress here, largely because we need to combine this with our current CaseInsensitiveDict
to produce some kind of horrible monstrosity like a CaseInsensitiveMultiDict
.
Some discussion occurred over on shazow/urllib3#236 about this. The summary was basically:
MultiDict
If you fancy working on that, go nuts!
Werkzeug has a multidict too if you need more inspiration
On Dec 6, 2013, at 10:21 AM, Cory Benfield notifications@github.com wrote:
@dmckeone Sadly there's been no particular progress here, largely because we need to combine this with our current CaseInsensitiveDict to produce some kind of horrible monstrosity like a CaseInsensitiveMultiDict.
Some discussion occurred over on shazow/urllib3#236 about this. The summary was basically:
Steal Django's MultiDict Add key case-insensitivity. Implement the structure in urllib3, and just have Requests steal it. If you fancy working on that, go nuts!
— Reply to this email directly or view it on GitHub.
Werkzeug has a multidict too if you need more inspiration
That's where we stole our initial implementation from if I remember correctly.
I think this is pretty low priority now, honestly. Things work pretty well now, and are stable. I'd rather not change things than change things :)
Hey requests folks, new to the project and found my way here pretty quickly after browsing the docs and noticing requests uses a dict-like structure for headers and so can apparently silently drop headers when there are multiple with the same key. Is that documented anywhere? I looked and was surprised to not see that documented in any of these places but maybe I missed it:
http://docs.python-requests.org/en/latest/user/quickstart/#custom-headers http://docs.python-requests.org/en/latest/user/quickstart/#response-headers http://docs.python-requests.org/en/latest/api/?highlight=headers#requests.request http://docs.python-requests.org/en/latest/api/#requests.Response.headers
I say surprised just given Werkzeug's example as well as the prevalence of multiple headers in the wild (e.g. curl -I https://www.google.com/ https://github.com
etc.), but maybe I'm just misunderstanding the impact? Just tried requests.get('google.com').headers.get('set-cookie')
and looks like requests is combining the two set-cookie headers into a single one, so not a perfectly faithful representation of the response but I guess better than dropping the second header completely.
Anyway, just thought I'd ask if this would be a good time to take a look at fixing this again, or at least documenting it? Would it help to split this out into one ticket for headers and another for query params in case fixing one is easier than both at the same time? Thanks for any info/help and for the great work on requests in general.
Hi there!
We actually don't have a bug here. =) Repeated headers are governed by very specific rules from RFC 7230 Section 3.2, which I'll reproduce here:
A recipient MAY combine multiple header fields with the same field name into one "field-name: field-value" pair, without changing the semantics of the message, by appending each subsequent field value to the combined field value in order, separated by a comma.
If you make a request to https://github.com/
with curl or in a browser you'll see that it repeats the Vary
header:
Vary:X-PJAX
Vary:Accept-Encoding
You'll also see that we fold those together:
>>> r = requests.get('https://github.com/')
>>> r.headers['Vary']
'X-PJAX, Accept-Encoding'
This is a perfectly valid representation of this HTTP response.
Set-Cookie
is itself a special case because the cookie syntax is a little ambiguous as regards that comma. This is the closest we have here to a bug: we currently join those headers with commas, but really we should join them with semi-colons. If you use our cookie handling logic (e.g. response cookies
, session.cookies
) you won't encounter any problems because they parse the header correctly anyway, but if you were parsing cookies yourself you might encounter a problem.
Otherwise, we don't need a new header dictionary. =)
Thanks for the thorough reply, @Lukasa. Great that requests' headers representation is not as wrong as I was afraid of. I'd think this would be worth documenting though, given that so many other libraries use a data structure which preserves multiple headers more closely to how they were sent, leading other requests users to maybe ask you the same thing I did.
Also worth logging a separate bug for joining multiple cookie headers with semicolons?
I'm open to a separate bug for the cookies with semicolons. Documenting the joint headers is strictly optional, but I can see a value in having it be a note in one of the sections.
Sorry for the necro-posting...:skull: :computer: ...but...
There are situations--like testing HTTP servers or APIs where sending individual headers per line is necessary (to be sure the server handles them correctly). Additionally, having the option to present repeated headers one-per-line also makes things much more legible and therefore debuggable. :bug: :mans_shoe:
Any chance this could be revisited?
@BigBlueHat is there a reason a list of tuples is not sufficient for your use-cases? Also what do you mean "individual headers per line"?
I want to be able to send a request like the first example here: https://tools.ietf.org/html/rfc7240#page-5
POST /foo HTTP/1.1
Host: example.org
Prefer: respond-async, wait=100
Prefer: handling=lenient
Date: Tue, 20 Dec 2011 12:34:56 GMT
You can put all those on a single line, but sometimes, those Prefer
lines can get quite long as in this example from the Web Annotation Protocol:
GET /annotations/ HTTP/1.1
Host: example.org
Accept: application/ld+json; profile="http://www.w3.org/ns/anno.jsonld"
Prefer: return=representation;include="http://www.w3.org/ns/ldp#PreferMinimalContainer"
Prefer: return=representation;include="http://www.w3.org/ns/oa#PreferContainedIRIs"
Concatenated that would look like:
GET /annotations/ HTTP/1.1
Host: example.org
Accept: application/ld+json; profile="http://www.w3.org/ns/anno.jsonld"
Prefer: return=representation;include="http://www.w3.org/ns/ldp#PreferMinimalContainer", return=representation;include="http://www.w3.org/ns/oa#PreferContainedIRIs"
...which is valid...but a pain to understand.
Additionally, RFC7230 deprecates line folding, so making that prettier by wrapping it.
In my specific case, I'm working on building a compliant Web Annotation Protocol server in Python and I need to verify that receiving Prefer
headers on multiple lines are being received and not overwriting each other...as the server needs to support that (re: the above stuff).
It seems Requests and most other Python client and server code build off of rfc822.Message which does support multiple headers internally, but only lets you get at the first one, the last one, or all of them un-parsed.
In short, there are ways to do it, but it doesn't bode well for HTTP accuracy in client and server code written in Python. My hope is that Requests (and by extension, things like httpbin.org) will support repeated headers on individual lines to make the future just that much more inviting. :sunrise:
Generally speaking, Requests aims to have a good API by in part restricting what is possible. In this case, you can try to achieve it by hacking around the internals, but at a certain point you're going for low-level enough access that Requests may not be the best choice and you may need to consider something like urllib3.
Thanks @Lukasa. I do like what I'm seeing wrt to the HTTPHeaderDict class. Guess I'll do some digging to see if that would handle sending repeats one per line.
Cheers. :tophat:
Comment by Kenneth Reitz: