Closed dmckeone closed 9 years ago
+1
Should the values be ordered or not? I have no preference, I'm just wondering if others do.
For what? Headers? Query string?
Typically I would say yes either way. Preserving order does not occlude the un-ordered use case whereas the opposite does.
@xealot yeah, that was my opinion. I have something working currently, but it so minimal that I'm not sure it is covers all the corner cases. .... I'll be pushing it to a branch in my repo tonight.
+1 for ordered support. The impetus for this feature was discovered when using werkzeug.datastructures.OrderedMultiDict
with GET/POST parameters, so if ordered support wasn't there it wouldn't hit the use case.
I'm not very familiar with what other MultiDicts are in the Python community, but I think the only thing that is really needed (at least for werkzeug MultiDict) is to pass items(multi=True)
or iteritems(multi=True)
when iterating instead of just items()
or iteritems()
. Because Requests used to work with lists of 2-tuples, it seems to support this behaviour almost everywhere, it just needs to iterate slightly differently when breaking up the dictionary into key-value pairs. If all MultiDicts behaved this way (with the same parameter) then perhaps a solution like:
# requests.utils
def to_key_val_list(value):
...
if isinstance(value, dict):
try:
value = value.items(multi=True)
except TypeError:
value = value.items()
could work to take any dict that behaves like a MultiDict without having to require a separate Requests-only dictionary.
The problem is that this solution would be specific to Werkzeug, so perhaps sigmavirus24's in-progress solution is better; force it into a known type that is specific to Requests, and then know that it's always that type.
I'm not familiar enough with what is out there in the Python community, or what is in Requests, to make that judgement call. However, I do know that the 'behave-as-if' solution mentioned above would be a little more user-friendly because the user would not be required to create a new object instance prior to calling Requests.
https://github.com/sigmavirus24/requests/tree/multidict has the first version of the MultiDict. (I forgot to update this issue a couple days ago). I haven't switched any of the internal logic over to using it, but it observes order. @dmckeone perhaps you could explain the importance of multi=True
. Maybe it's because I haven't had my coffee yet, but this doesn't look at all obvious as to the reasoning to me. Then again, I haven't used Werkzeug's MultiDict and should probably go read those docs.
As a thought, would just using the werkzeug toolkit be too much of a requirement to add to requests? All of their data structures are incredibly useful and it would be sad to have to rewrite them.
Or perhaps with permission some of it could be included.
permission has already been granted, and we've been stealing a number of werkzeug data structures for a while :)
I didn't know we could borrow from them. :) Although this takes all the fun out of it for me :-P
Werkzeug's necessary structures are in requests/structures.py
on the branch I linked above. They're not used anywhere yet, but I also want to clear up how attribution should be done. I have a note in the doc-strings for both classes, but I'm not entirely certain that is enough.
it's enough, I've discussed it with armin on multiple occassions
@sigmavirus24: The multi=True
argument allows not just ordered items in the dictionary, but also multiple keys in the dictionary. If the multi=True
argument is not passed, then all duplicate keys are removed when iterating over the the keys/items/values. It's required for me because I'm working with a legacy system that requires ordered, duplicate GET parmeters. Something like: http://www.example.com/item?a=1&b=steve&a=2&b=mary&a=3&b=bob
Thanks for the concrete example @dmckeone
as I mentioned in the ticket jkbr linked, I think this needs to be case insensitive too for header use ( or a case insensitive version).. HTTP allows for multiple headers with the same name.
Hm, that sounds appropriate. I have some other stuff to do this weekend, but I'll try and knock some if out tonight.
Is this still an issue? I'll work on it if so, but I'd like to make sure this isn't being fixed/worked on elsewhere.
@ihodes I started working on it here but I've been too busy to do much. It needs to be updated with upstream master and more work needs to be done with it. The tricky part is that deciding how best to utilize the multidict in the parameters to the different methods, specifically to things like params
, headers
, etc. The former needs to preserve order which is why we use the OrderedDict, the later isn't so much trouble but may require a different method than what we already have for accepting and merging lists of (key, value)
tuples. Also, don't take the above as discouragement from doing this. You'll probably be able to come up with a much better solution than me and if you want code review on what you do, I'll be happy to help! :)
Reading over this thread, it seems as though we should be using werkzeug's OrderedMultiDict in lieu of MultiDict (though apparently there is something like an order of magnitude slowdown from using this, that probably—and this is testable—wouldn't really affect the user experience much) for both headers and params?
How did I miss werkzeug's OrderedMultiDict
? You sound right, but I'm wary of that slowdown.
@ihodes OrderedMultiDict
is slower, but in practice I'm not sure that it will matter. I may be confused about where this is needed throughout all of Requests, but for input the user will make the performance decision. If I (the user) wish to take the performance hit because I need ordered support, then I will pass requests an OrderedMultiDict
. If I don't, then I'll use regular MultiDict
. All Requests should have to do is iterate through what the user gives it, and put it into the format it needs internally (which I think is an ordered list of 2-tuples for most things). Since both OrderedMultiDict
and MultiDict
can be iterated through with something like [(key, value) for key, value in headers.iteritems(multi=True)]
(where headers
is an instance of OrderedMultiDict
or MultiDict
), Requests should be able to be indifferent about the performance implications.
@dmckeone that'd be (sort of) ideally how it would work were we to go with this additional datastructures. One issue is that there isn't a consistent API to these structures—we'd need to explicitly dispatch on the type (because we have to pass multi=True
if the dict is a MultiDict). That's unideal. Also, for the user, constructing these dictionaries is a PITA (in my opinion) and requires importing another datastructure in order to use.
What if we allowed the user to input a list of (key, val) tuples instead (which works right now, but order/multiples are lost)? That way it's lightweight, and much easier to understand & use than importing and building up an OrderedMultiDict.
E.g.
requests.get("http://example.com", params={"yep": "old way works"})
requests.get("http://example.com",
params=[("yep", "old way works"), ("yep_addn", "123"),("yep", "new way is awesome")])
Note that the second GET request is NOT equivalent to the currently possible request:
requests.get("http://example.com",
params=[("yep", ["old way works", "new way is awesome"]), ("yep_addn", "123")])
…because the "yep_addn" key/val won't be interleaved between the two "yep" params.
I can see in #1103 that this was requested before, but I don't know if it was agreed to be a good/bad idea? Right now, it makes the most sense to me. This almost works now, it just doesn't allow for ordered AND repeated params (c.f. @dmckeone's example in #1103).
So the reason we want the MultiDict
s underneath everything is so that the users don't have to concern themselves with it. They can always pass it in and conversion should would just fine. We currently accept both dicts and ordered pairs as @ihodes demonstrated and both are (as far as we're concerned under the hood) converted to OrderedDict
s. The issue is that this doesn't allow for multiple items (again as @ihodes demonstrated).
What I wonder is if we could subclass our existing OrderedDict
s, and force them to allow multiple entries. Then with that in mind, we could probably also subclass that for the CaseInsensitiveDict
so to return as the headers attribute on PreparedRequest
s and Response
s. Headers can occur several times so this is really the most crucial case where we need the implementation. It would certainly take more thought and work than simply copying flask/werkzeug's implementations, but it would also likely involve less copying (which is bothersome to me even with permission).
To be clear:
To be clear: • We cannot drop the condition that they are ordered (even if sent in as a dict) because that would be a regression • We cannot drop the ability to accept anything like above
Absolutely agree. I'm not sure what you mean when you say that the params remain ordered if sent in a dict, as they don't currently/can't (I'm sure I'm just not understanding you correctly. e.g. requests.get("http://localhost:8000", params={2: 1, "a": 2, 1: 3}) # => "GET /?a=2&1=3&2=1/ HTTP/1.1"
).
I guess I'm just not sure the advantage we get over supporting and/or offering thee alternative datastructures if we properly support ordered & repeated params in an iterable of (key, val) tuples. It seems to be the cleanest and simplest API. If a user wishes to use some subclass of dict, as long as it has a method to an iterable of tuples, it would work. (e.g. requests.get("http://localhost:8000", params=someinstanceofadict.iteritems(multi=True, whatever=True))
.
My proposal is then to just properly support ordered tuples (with repeated keys). Here are a couple examples and expected outcomes with GET requests and query strings, but assume an isomorphic outcomes with headers (WLOG).
requests.get("http://localhost:8000", params={9: [1,2,3], "param_a": "hmm"})
# => ?9=1&9=2&9=3¶m_a=hmmm
requests.get("http://localhost:8000", params=[(9, [1,2,3], ("param_a", "hmm"))])
# => ?9=1&9=2&9=3¶m_a=hmmm
requests.get("http://localhost:8000", params=[(9, [1,2], ("param_a", "hmm")), (9, 3)])
# => ?9=1&9=2¶m_a=hmmm&9=3
And, if someone wanted to use their own class for params or headers, it would be done like so:
requests.get("http://localhost:8000", params=custom_class.to_list_of_tuple_aka_iteritems(flag=True))
# => ?9=1&9=2¶m_a=hmmm&9=3 (or whatever the list of tuples comes out as)
Does this solve all the issues (ordered, repeated and possibly interleaved params, [understandability, succinctness]) we need it to?
I'm not sure what you mean when you say that the params remain ordered if sent in a dict
Yes I mis-spoke. I mean as soon as we receive the dict, we convert it and the order is preserved. Presumably if someone's sending a dict they're either aware it won't hold order and that doesn't matter to them, or they don't know and will find out.
If a user wishes to use some subclass of dict, as long as it has a method to an iterable of tuples, it would work.
Really, when using a MultiDict
or any variant of it, I don't quite get why one has to call the items
or iteritems
method with multi=True
. That should be the default. But I digress. The API should be they can use that so long as .items()
(which is what we rely on) will return the data as they expect it to and in the order they expect it.
My proposal is then to just properly support ordered tuples (with repeated keys)
This will not work in all cases. People expect various items to be dicts, most commonly the headers
attributes on both PreparedRequest
and Response
objects. Changing that would be a severe (breaking) change to the API. I may be misunderstanding your proposal though (which I think I am), in which case, I think we already technically support the example usages you provide except for the fact that they don't actually work that way because of how the OrderedDict
's __init__
works.
Does this solve all the issues (ordered, repeated and possibly interleaved params, [understandability, succinctness]) we need it to?
Storing them behind the scenes as a list of tuples does not. The MultiDict will though. I think just trying to subclass OrderedDict
may be enough. Given how it already works we should only really need to concern ourselves with __init__
, __setitem__
and __getitem__
. I'm going to sketch some of this out on the train this afternoon to see how feasible it should be.
Ah, yeah. Behind the scenes we should use a OrderedMultiDict (but not Wekzeug's, because due to the multi=True issue its API isn't compatible with Requests'). As for how the user can use this feature, either passing params=/headers= an OrderedMultiDict (or dict, or MultiDict, or anything implementing items :: obj -> [(key, val), .. ]
[don't hate]) or array of tuples should work, yes? (Currently it doesn't work as expected; repeated keys are squashed, and only the last tuple with a repeated key in the list is used).
I'll just throw out a couple crazy ideas here...
In Flask Armin handled this scenario by making an assumption that seemed to be something like this (I don't know if this is what he thought, but the design seems to indicate this):
Ordered parameters are less likely, and less performant, than order independent parameters. Therefore, default all parameters to the more performant solution, and allow the user to override the behaviour if they really need it.
This meant that in order to get ordered parameters you had to do:
app_instance.request_class.parameter_storage_class = ImmutableOrderedMultiDict # Default is ImmutableMultiDict
Now requests has a different API, so overriding the storage class is not always practical or desirable for something simple like requests.get("http://localhost:8000", params={'a': 'b', 'c': 'd'})
, but if the response had a flexible storage class then then it could be dictated on the request by additional kwargs: requests.get("http://localhost:8000", params=[('a', 'b'), ('c', 'd')], ordered=True)
Something like that could allow using MultiDict internally, and making ordered a user-specified option. Alternatively, you could just infer ordered if an actual OrderedMultiDict was passed as the parameters, or something else that somehow infers order.
This would then give you request/response pairs where, if you asked for order, you can assume OrderedMultiDict in the parameters/data. And if you didn't ask for order, well you'll get back a regular MultiDict.
As far as this note goes. If multi=True not being the default is the only hang-up, then why not just create an adapter that make multi=True be the default?
class OrderedMultiDictAdapter(OrderedMultiDict):
....
def items(multi=True):
super(OrderedMultiDictAdapter, self).items(multi=multi)
Seems a lot simpler than re-writing, no?
Currently it doesn't work as expected; repeated keys are squashed, and only the last tuple with a repeated key in the list is used
Right, that's why I said it was "technically" supported, i.e., we don't complain it just doesn't work.
Now requests has a different API, so overriding the storage class is not always practical or desirable for something simple like
requests.get("http://localhost:8000", params={'a': 'b', 'c': 'd'})
, but if the response had a flexible storage class then then it could be dictated on the request by additional kwargs:requests.get("http://localhost:8000", params=[('a', 'b'), ('c', 'd')], ordered=True)
Exactly right. Requests has a much different API, specifically since we promise (since 0.13.x or 0.14.x, and carried over to 1.x) that we'll retain the order of the items received through params
, data
, files
, if we're given them in an manner that already preserves order. Adding that extra keyword is a breaking change of the API and not acceptable as far as I'm concerned.
If multi=True not being the default is the only hang-up
It isn't preventing any of this. Other concerns are. This is just a pet peeve of mine because if your user is passing or expecting a MultiDict
, making them do extra just to get all of the values out just seems like very poor API design to me. If it had been a serious issue, I wouldn't have started the work towards this by just copying Armin's approach to the MultiDict
.
@sigmavirus24 Agree about the MultiDict.items(); it is odd, and I don't know why it's like that. Not changing the API makes sense too.
So, currently Requests does this (Please feel free to correct me if some of this is wrong):
OrderedDict
for everything, instantiated by from_key_val_list()
merge_kwargs
in sessions.py
seems to rely on unique, case-independent keys (It wouldn't concat multi-keys)@sigmavirus24 Could you do a quick explanation why OrderedDict
is used in place of dict? Everything I've seen seems to not care all that much about order in the parameters. Is this a relic of when requests used to accept a list of 2-tuples (as the docstring in from_key_val_list()
seems to indicate)?
To me, the ideal would be to use MultiDict
internally in from_key_val_list()
, with some kind of session/response based switch that could use OrderedMultiDict
in it's stead. This would mean:
from_key_val_list()
to interpret any non-MultiDict
dictionaries as MultiDict
(e.g. change OrderedDict
to Dict
MultiDict
(just enabling multi-key) merge_kwargs
in sessions.py would need some kind of concatenation behaviour if order mattered, and MutliDict
would need to be case insensitive.Could you do a quick explanation why
OrderedDict
is used in place of dict?
The quickest explanation is Amazon.
The next quickest explanation is that when doing multipart/form-data
requests in some cases (like when interacting with Amazon) the server absolutely requires order be preserved. You could do that (ostensibly) by passing an OrderedDict
but the old implementation would convert it to a dict and magically all hell would break loose. I see no way in which removing that built-in niceness is acceptable to anyone whether on the session level or on the request level.
To me, the ideal would be to use
MultiDict
internally infrom_key_val_list()
That's exactly what I had in mind with the one reservation about ordering.
Returned and prepared headers
should both use straight MultiDict
s. Those don't care about order in either case. This is why I'm starting to think we should ship that first, then work towards a resolution on how to convert params and other items to a MultiDict
(or variant thereof).
I still don't think (with the current OrderedDict
implementations) that subclassing and forcing it to accept multiple values would be too terrible, but we'd have to see what the performance hit is comparing that to Armin's OrderedMultiDict
.
I don't quite remember what merge_kwargs
does right now but I'll take a closer look in an hour or seven.
Just realized something. I would imagine that OrderedMultiDict
and OrderedDict
pay the same performance penalty. Perhaps it's a non-issue and the first implementation should just go straight to OrderedMultiDict
?
On a side note, I just browsed through merge_kwargs
on GitHub and would like to ask anyone who's reading this thread to please rewrite the case-insensitive logic in there. It will cause a huge performance hit for a large number of original_keys
.
Replying to @dmckeone
Yeah I wouldn't mind going directly to OrderedMultiDict
but like I said, it isn't necessary everywhere. ;)
Yeah, barring any extreme performance hits (unlikely), it seems like OrderedMultiDict should be the default structure. Users of Requests should be able to pass in lists of tuples, or any sort of dictionary that can be iteritems
'd over.
This sound right?
iteritems
Should be items
because iteritems
was deprecated in Python 3, but yes!
Exactly. If either of you do this, I will definitely owe you one.
Er, yeah, items
(old habits…)
Okay! I'll take a stab today or tomorrow. And if anything we owe you & @kennethreitz!
Don't forget @Lukasa. He's done and is doing far more than me.
(+ @Lukasa!)
Had some time to get something started. Other than adding all the werkzeug structures (which will need some Python 3 work), it took relatively few changes to get all the tests to pass. Even added an additional test that is similar to @ihodes example above.
There is probably quite a bit more work to do, but since I'm not really familiar with all the ins and outs of Requests, I thought I'd just put this up as food for thought for another implementation, or for refinements on this one.
Hah we just did the exact same thing, sans the test (though I just fiddled with the OrderedMultiDict, setting the multi param's default to True
in iteritems
and items
…). Unfortunately, I messed up something with the copy-paste or something and have been in traceback hell.
I read your changes, and they look like we all agreed on above; why not submit a PR? @sigmavirus24, could you CR this? Awesomesauce.
If no-one gets around to it before this evening, I might take a look at what Python 3 work needs to be done.
That was a silly promise, I'm at a hackathon this evening. This weekend then.
I looked over @dmckeone changes. I might fork his branch and do some more work on it tonight because I am far better at reviewing code in vim than on GitHub. I tend to miss stuff on the site. But so far it looks fantastic. Thank you @dmckeone :cake: :cake: :beer:
Since everyone seems to like it, I don't mind keeping at it. @sigmavirus24 if you find anything then let me know, and I'll be happy to add it in. I'll go fix that test first. Would you prefer I do a Pull Request after that, or continue changes on this branch as you do CR?
Are there any recommendations for doing performance profiling for Requests? Is there a benchmark that is commonly used?
As far as Python 3 support goes, I tested it under Python 3.3 and everything seems to behave the same as 2.7. I was mostly talking about Werkzeug's lack of Python 3 support. For example, should we keep all the iter...() methods around in MultiDict and OrderedMultiDict? There is a BadRequestKeyError that is currently just assigned to KeyError. In Werkzeug that is used to indicate that a bad request was sent, but since Requests is a little different, perhaps we should change the name or throw a different exception?
I don't mind keeping at it.
I just didn't want to overwork you :)
You can start a pull request and then as you push to it just ping me to do more code review. It'll also get @Lukasa and @kennethreitz's attention.
Are there any recommendations for doing performance profiling for Requests?
No... but let's make one!
For example, should we keep all the iter...() methods around in MultiDict and OrderedMultiDict?
If I remember correctly, those handle the work for .items
on both. I'm 100% for renaming iteritems
to items
though. Also for setting the default for multi
to True
. (Like @ihodes did)
perhaps we should change the name or throw a different exception?
I'm okay with throwing a plain ol' KeyError
. That seems logical and is something we could reasonable expect.
So to review, the following request
parameters need to use an OrderedMultiDict
:
And it would be ideal if both request
s and response
s use a MultiDict
for headers, i.e., convert headers to a MultiDict
when taking them from the user and when returning a response use a MultiDict
to hold the headers present. I doubt any server is using a header more than once, but it's part of the spec that we can send more than one, so we may as well be safe and expect the more horribly written servers to think they can return one.
Thanks again @dmckeone :cake:
I'd add that headers should all be case-insensitive. Whether or not you force lower()
on all keys or maintain a separate case-insensitive key lookup dictionary (as is currently done) is probably worth thinking about (the spec says headers are case-insensitive, but if they're passed with certain capitalization, should we just leave them be?)
ALSO, unfortunately, header order DOES matter (for headers with the same name), per the spec. So, we need a CaseInsentitiveOrderedMultiDict (:cry:).
I'm willing to step in here and work on something, but let's figure out some sort of division of labor so we don't write the same code again :) But if you're happy to continue on this feature, I'm happy to let you & work on a different issue!
I'd add that headers should all be case-insensitive. Whether or not you force
lower()
on all keys or maintain a separate case-insensitive key lookup dictionary (as is currently done) is probably worth thinking about (the spec says headers are case-insensitive, but if they're passed with certain capitalization, should we just leave them be?)
Yep, I had completely forgotten that.
ALSO, unfortunately, header order DOES matter (for headers with the same name), per the spec. So, we need a CaseInsentitiveOrderedMultiDict (:cry:).
And here's why I need to go and read all the specs before ever speaking again.
:-P
I'm willing to step in here and work on something, but let's figure out some sort of division of labor so we don't write the same code again :) But if you're happy to continue on this feature, I'm happy to let you & work on a different issue!
You two can work this out. There are also plenty of issues I'm sure you could help with @ihodes. If you want guidance with any of them, let me know. I'll be happy to point you in the generally correct direction.
I'll continue on with what I have for params, data and files. I'll do the Pull Request and make sure that data and files are supported as well.
Is requests attempting to have any kind of parity with werkzeug's structures? While this was a good proof-of-concept to see if MultiDict would work, I don't think its a good idea to change the internal structure of them too much (flipping the default, for example). Ideally, one day these structures will be in a common place that both Requests and Werkzeug could use (alluded to by @kennethreitz's blog post). I know that from my perspective it would be great if I could pass a Werkzeug MultiDict
and have it work in Requests as-is. That way the meaning of MultiDict
isn't fragmented, and we can end up with a de-facto standard MultiDict
.
When it comes to the implementation of the headers, I'm less sure. I can see the use case for params and data, and I can take a look at files because I can envision it being the same, but I'm not sure I want to change it for headers. Does the current implementation not work in some way that MultiDict
would solve? Implementing this CaseInsensitiveOrderedMultiDict
just seems like we'd be diverting too far from simple, and too far from existing structures, and doing so for very little benefit. So if someone else who knows why this would be important wants to implement it, I'm happy to leave it to them.
Is requests attempting to have any kind of parity with werkzeug's structures?
I haven't seen any explicit mention of this. And yes I read Kenneth's blog post but @core is anything except active. (I'd be happy to work on it with them though. :-P) And really I think forcing a brutally sane API is more important especially since we will be exposing these structures to users through PreparedRequest
and Response
objects.
I know that from my perspective it would be great if I could pass a Werkzeug
MultiDict
and have it work in Requests as-is.
The problem with this is trying to catch every single case. We rely on the object being iterable with two-tuples or being able to be coerced into that format. Certainly MultiDict.items()
returns that, but it doesn't return to us what the user expects. Right now we have to test if the object we have is an instance of MultiDict
and while that seems ok now, that's a slippery slope. What do we next have to test to assure that fractured APIs are allowed?
just seems like we'd be diverting too far from simple,
I agree.
We don't currently use a CaseInsensitiveDict
on PreparedRequests
so using a plain OrderedMultiDict
there is fine. As for returning headers on a response object, perhaps I was too ambitious in saying that it was necessary. I'm okay leaving it as it is for now until a better consensus can be reached.
Certainly MultiDict.items() returns that, but it doesn't return to us what the user expects. Right now we have to test if the object we have is an instance of MultiDict and while that seems ok now, that's a slippery slope. What do we next have to test to assure that fractured APIs are allowed?
I will have to think on this a little. I think most of it can be done with duck typing, so long as the object behaves like the MultiDict we expect. The question becomes, what do we expect and what does the user expect? A werkzeug/flask user may expect to pass multi=True, and a non-werkzeug user may expect the opposite. So either way, someone will lose there.
We can look at that later though. I'll get going on getting something that works for params/data/files first.
A werkzeug/flask user may expect to pass multi=True, and a non-werkzeug user may expect the opposite.
I might be approaching this naively but I think a good 90-95% of our users are expecting that all of what they pass in as data, files or parameters be consumed, so not using multi=True
for a default means we have to do type checking to assert that we consume everything.
@sigmavirus24 I added the pull request and came up with a blend that I think gets both of us what we want. I set all the multi=False
kwargs to be multi=True
. However, I don't think it will be too big of an issue for users, because, unless I missed something, the MultiDict
s are mostly internal to how Requests processes sessions. The only way you can "see" them is by interrogating body in the prepared request afterwards. params=
is the exception, because you can set it on the Session
.
Additionally, in all places that take in a MultiDict
, I removed all explicit checks for isinstance(d, MultiDict)
, and replaced it with a try: except TypeError:
or a check for the needed attribute, iterlists
.
I also added a couple extra tests to prove the ordering in data=
and files=
Comment by Kenneth Reitz: