Add MultiDict Support - Githubissues

dmckeone commented 11 years ago

We need to support MultiDict. This is long overdue.

Lukasa commented 11 years ago

If we ever embed a MultiDict in an object the user can get at, e.g. a PreparedRequest, this becomes a more significant API change. In particular, right now we enforce the contract that getting headers from any object in Requests will give you a string-to-string dictionary. Changing that is a breaking API change. I think we need to think really strongly about whether that's a good plan.

dmckeone commented 11 years ago

@Lukasa I took a look through and I believe that my changes to params=, data=, and files= are all safe from an API perspective:

PreparedRequest does not expose params, data or files
Session has a params attribute that I initialize to OrderedMultiDict, but it is hidden because it isn't declared in the __attrs__
Request has params, data, and files exposed, but the user can't get to that because of __attrs__

In the future, I don't think there is any way around an API change for headers (which is why I shied away from that change), because a dict, or in this case a CaseInsensitiveDict, just can't represent multiple keys. So that implementation will require some thinking about the API, and hopefully a structure more dedicated to handling HTTP Headers; maybe just a dict-like class called HTTPHeader that has the behaviour of the earlier mentioned CaseInsensitiveOrderedMultiDict, but is clearly bound to conform to HTTP Header behaviours and rules. After all, CaseInsensitiveDict is really only used for headers anyway; perhaps it's best to call a spade a spade?

sigmavirus24 commented 11 years ago

@dmckeone I think either you or I am misunderstanding the role of __attrs__. As far as I know, it does not hide any information from the user but presents to libraries like multiprocessing and pickle what they need to properly serialize and reconstruct those objects. So, yes, in fact params is entirely exposed to the user. There would be no other point in having it. :)

These were all planned and endorsed changes by @kennethreitz. And to a degree they're quite necessary to make requests entirely functional. Leaving headers as is, right now, seems okay to me. I'm just not sure how well having the Session.headers attribute as a MultiDict will work.

Lukasa commented 11 years ago

@sigmavirus24 is correct, __attrs__ does not limit exposure of those parameters. In particular, Session.params, Request.params, Request.data and Request.files are deliberately exposed parts of the API, intended to be used by users.

I think if we're going to make backward-incompatible changes to the API we should do it in one fell swoop. I'm still remembering the less-than-positive initial reaction to turning Response.json into a method, and this change is far more wide-ranging than that one.

Changing any of these parameters, especially if multi=True is the default (which it probably should be), is a backward-incompatible change. I'm all for doing it, as I think the current behaviour could be improved, but any implementation needs to very carefully planned, and needs to be clearly documented and accompanied by a decent version number bump.

dmckeone commented 11 years ago

@sigmavirus24 @Lukasa Thanks guys. Mea culpa about the __attrs__, you're right.

Changing any of these parameters, especially if multi=True is the default (which it probably should be), is a backward-incompatible change

You may be right (and certainly would be if headers= was included), but I believe its only backwards incompatible with the current change if you tried to use assume order or duplicate values, and since that wouldn't be possible in older versions due to the use of dict, I don't think it's incompatible in a breaking way.

I'm thinking of cases like this (where request is an instance of Request):

isinstance(request.params, dict)
request.params['param'] for single and iterable values.
request.params == {'param': 1}

Have I missed a case here, or is it a reasonable expectation to say that, if you want to use ordered duplicate params, data, or files then 1.2X (or whatever version this ends up in) is your baseline? If you don't use ordered or duplicate params, data, or files then things should work identically for all versions.

So with that in mind, how would you like to proceed with this? Should I be incorporating headers= into my branch and targeting a full version release with docs and all that, or is the pull request I submitted a way to start that works well enough, and then headers can be approached separately. I'm fine with either, or something else entirely, but since I'm new to this project I'm happy to follow your more experienced lead.

sigmavirus24 commented 11 years ago

Not a worry @dmckeone . I get confused plenty about stuff too. @Lukasa usually corrects me kindly. :)

I think for the current pull request you can leave out headers if you're that uncomfortable with it, but it has to be implemented before the next release.

Part of the documentation will have to include a section about how the MultiDict works with respect to __setattr__ and __getattr__. That (if I remember correctly) is non-obvious behaviour and in this instance I have no criticism of this aspect of the API. :P

Lukasa commented 11 years ago

Getting stuff wrong in public is basically the definition of OSS development, as far as I can tell. =P I certainly do it enough.

I think my concern primarily applies to changing headers. We don't expect param, data or files to be mutated by the library, so the user should get out whatever the hell they put in. However, we mutate headers quite a bit. If we change what we do there, it's definitely breaking.

I'm with @sigmavirus24 on this: I want to change it, and I think we have to change it, but changing headers is breaking. We might want to consult with @kennethreitz before we go charging ahead on that part of it, to see if he wants to hold off until some later release.

And in case I've seemed a little blunt in this conversation (reading over it again I sure feel like I did), I want to be clear: your work is excellent, and I want to thank you for taking it on. =)

dmckeone commented 11 years ago

@sigmavirus24 Fully documenting how MultiDict behaves is probably a good idea when we do headers. As far as headers goes, I'm not necessarily uncomfortable with the actual work, just the implication of the API change and the associated fallout if it were to be done anything less than perfect. In that way I like @Lukasa's idea to get consensus first, and do it as part of a unique version number.

@Lukasa I actually like direct tone because its clearer, so I didn't mind the bluntness. Thanks for the compliment as well!

So just to summarize what I think I've read here for everyone's sake (there is quite a long comment thread here):

MultiDict support will come in two stages: 1) #1316 for params, data, and files in something relatively soon 2) Full documentation of OrderedMultiDict, as well as a future class that combines the behaviours of OrderedMultiDict with CaseInsensitiveDict, and an implementation of the earlier mentioned future class under the headers kwarg. All of which happening under a unique version that incorporates an anticipated API change.

cdunklau commented 11 years ago

Whelp, I just found this thread after commenting a bunch on @dmckeone's open PR, so I'm going to spend some time over the next couple days to try to understand the issues far better before I make additional comments. I mainly wanted to point out that Session.headers is a CaseInsensitiveDict now after the #1339 merge, just in case the headers change is still floating around as an option.

dmckeone commented 11 years ago

Since there are a number of pull requests in queue that impact this feature (#1321, #1343, #1338), and the discussion around #1316 created some new requirements, I've withdrawn #1316 in favor of a MultiDict that is derived from Werkzeug's OrderedMultiDict, but behaves more closely to what requests needs:

items() that returns a list of 2-tuples, including duplicates, with no multi=True key-word
merge_kwargs that replaces keys, rather than appending
type detection based on collections.Mapping (introduced in #1339)
simplification of code for Python 2/3 compatibility

I'll submit this new MultiDict as a single pull request that encompases params, data, and files and includes a full set of tests for its behaviour. I will not include headers, since some IRC discussions indicated that changing headers to MultiDict was high risk, low-reward.

Sorry for the delay in this comment. Things have been busy.

sigmavirus24 commented 11 years ago

Sorry for the delay in this comment. Things have been busy.

Take your time. @kennethreitz is in transit so there's no rush. Also, I wouldn't blame you if you wanted to wait for those other pull requests to be merged before submitting a new PR. That seems like the most sane thing to do at the moment in my humble opinion.

dmckeone commented 11 years ago

@sigmavirus24 That is what I will likely do. I will begin work on making the required MultiDict changes and tests in the next few days and then just rebase everything once those pull requests go in.

sigmavirus24 commented 11 years ago

@dmckeone there's no need to close this. You can leave it open until your pull is issued & accepted. I don't think any of us mind.

Lukasa commented 11 years ago

I'm reopening this, just because it provides evidence that this issue is being worked on. Don't want people to get the impression that we gave up or aren't thinking about it. =)

dmckeone commented 11 years ago

@Lukasa @sigmavirus24 My mistake. I had closed the PR on purpose, but not this. Thanks for re-opening.

martingalloar commented 10 years ago

Hi ! Got lost and don't know where this has been worked on. Need this feature to move a project from custom HTTP lib to python-requests. There's a PR I can use to test this feature?

Lukasa commented 10 years ago

@dmckeone Sadly there's been no particular progress here, largely because we need to combine this with our current CaseInsensitiveDict to produce some kind of horrible monstrosity like a CaseInsensitiveMultiDict.

Some discussion occurred over on shazow/urllib3#236 about this. The summary was basically:

Steal Django's MultiDict
Add key case-insensitivity.
Implement the structure in urllib3, and just have Requests steal it.

If you fancy working on that, go nuts!

cdunklau commented 10 years ago

Werkzeug has a multidict too if you need more inspiration

On Dec 6, 2013, at 10:21 AM, Cory Benfield notifications@github.com wrote:

@dmckeone Sadly there's been no particular progress here, largely because we need to combine this with our current CaseInsensitiveDict to produce some kind of horrible monstrosity like a CaseInsensitiveMultiDict.

Some discussion occurred over on shazow/urllib3#236 about this. The summary was basically:

Steal Django's MultiDict Add key case-insensitivity. Implement the structure in urllib3, and just have Requests steal it. If you fancy working on that, go nuts!

— Reply to this email directly or view it on GitHub.

sigmavirus24 commented 10 years ago

Werkzeug has a multidict too if you need more inspiration

That's where we stole our initial implementation from if I remember correctly.

kennethreitz commented 10 years ago

I think this is pretty low priority now, honestly. Things work pretty well now, and are stable. I'd rather not change things than change things :)

requiredfield commented 10 years ago

Hey requests folks, new to the project and found my way here pretty quickly after browsing the docs and noticing requests uses a dict-like structure for headers and so can apparently silently drop headers when there are multiple with the same key. Is that documented anywhere? I looked and was surprised to not see that documented in any of these places but maybe I missed it:

http://docs.python-requests.org/en/latest/user/quickstart/#custom-headers http://docs.python-requests.org/en/latest/user/quickstart/#response-headers http://docs.python-requests.org/en/latest/api/?highlight=headers#requests.request http://docs.python-requests.org/en/latest/api/#requests.Response.headers

I say surprised just given Werkzeug's example as well as the prevalence of multiple headers in the wild (e.g. curl -I https://www.google.com/ https://github.com etc.), but maybe I'm just misunderstanding the impact? Just tried requests.get('google.com').headers.get('set-cookie') and looks like requests is combining the two set-cookie headers into a single one, so not a perfectly faithful representation of the response but I guess better than dropping the second header completely.

Anyway, just thought I'd ask if this would be a good time to take a look at fixing this again, or at least documenting it? Would it help to split this out into one ticket for headers and another for query params in case fixing one is easier than both at the same time? Thanks for any info/help and for the great work on requests in general.

Lukasa commented 10 years ago

Hi there!

We actually don't have a bug here. =) Repeated headers are governed by very specific rules from RFC 7230 Section 3.2, which I'll reproduce here:

A recipient MAY combine multiple header fields with the same field name into one "field-name: field-value" pair, without changing the semantics of the message, by appending each subsequent field value to the combined field value in order, separated by a comma.

If you make a request to https://github.com/ with curl or in a browser you'll see that it repeats the Vary header:

Vary:X-PJAX
Vary:Accept-Encoding

You'll also see that we fold those together:

>>> r = requests.get('https://github.com/')
>>> r.headers['Vary']
'X-PJAX, Accept-Encoding'

This is a perfectly valid representation of this HTTP response.

Set-Cookie is itself a special case because the cookie syntax is a little ambiguous as regards that comma. This is the closest we have here to a bug: we currently join those headers with commas, but really we should join them with semi-colons. If you use our cookie handling logic (e.g. response cookies, session.cookies) you won't encounter any problems because they parse the header correctly anyway, but if you were parsing cookies yourself you might encounter a problem.

Otherwise, we don't need a new header dictionary. =)

requiredfield commented 10 years ago

Thanks for the thorough reply, @Lukasa. Great that requests' headers representation is not as wrong as I was afraid of. I'd think this would be worth documenting though, given that so many other libraries use a data structure which preserves multiple headers more closely to how they were sent, leading other requests users to maybe ask you the same thing I did.

Also worth logging a separate bug for joining multiple cookie headers with semicolons?

Lukasa commented 10 years ago

I'm open to a separate bug for the cookies with semicolons. Documenting the joint headers is strictly optional, but I can see a value in having it be a note in one of the sections.

BigBlueHat commented 8 years ago

Sorry for the necro-posting...:skull: :computer: ...but...

There are situations--like testing HTTP servers or APIs where sending individual headers per line is necessary (to be sure the server handles them correctly). Additionally, having the option to present repeated headers one-per-line also makes things much more legible and therefore debuggable. :bug: :mans_shoe:

Any chance this could be revisited?

sigmavirus24 commented 8 years ago

@BigBlueHat is there a reason a list of tuples is not sufficient for your use-cases? Also what do you mean "individual headers per line"?

BigBlueHat commented 8 years ago

I want to be able to send a request like the first example here: https://tools.ietf.org/html/rfc7240#page-5

POST /foo HTTP/1.1
Host: example.org
Prefer: respond-async, wait=100
Prefer: handling=lenient
Date: Tue, 20 Dec 2011 12:34:56 GMT

You can put all those on a single line, but sometimes, those Prefer lines can get quite long as in this example from the Web Annotation Protocol:

GET /annotations/ HTTP/1.1
Host: example.org
Accept: application/ld+json; profile="http://www.w3.org/ns/anno.jsonld"
Prefer: return=representation;include="http://www.w3.org/ns/ldp#PreferMinimalContainer"
Prefer: return=representation;include="http://www.w3.org/ns/oa#PreferContainedIRIs"

Concatenated that would look like:

GET /annotations/ HTTP/1.1
Host: example.org
Accept: application/ld+json; profile="http://www.w3.org/ns/anno.jsonld"
Prefer: return=representation;include="http://www.w3.org/ns/ldp#PreferMinimalContainer", return=representation;include="http://www.w3.org/ns/oa#PreferContainedIRIs"

...which is valid...but a pain to understand.

Additionally, RFC7230 deprecates line folding, so making that prettier by wrapping it.

In my specific case, I'm working on building a compliant Web Annotation Protocol server in Python and I need to verify that receiving Prefer headers on multiple lines are being received and not overwriting each other...as the server needs to support that (re: the above stuff).

It seems Requests and most other Python client and server code build off of rfc822.Message which does support multiple headers internally, but only lets you get at the first one, the last one, or all of them un-parsed.

In short, there are ways to do it, but it doesn't bode well for HTTP accuracy in client and server code written in Python. My hope is that Requests (and by extension, things like httpbin.org) will support repeated headers on individual lines to make the future just that much more inviting. :sunrise:

Lukasa commented 8 years ago

Generally speaking, Requests aims to have a good API by in part restricting what is possible. In this case, you can try to achieve it by hacking around the internals, but at a certain point you're going for low-level enough access that Requests may not be the best choice and you may need to consider something like urllib3.

BigBlueHat commented 8 years ago

Thanks @Lukasa. I do like what I'm seeing wrt to the HTTPHeaderDict class. Guess I'll do some digging to see if that would handle sending repeats one per line.

Cheers. :tophat:

psf / requests

Add MultiDict Support #1155