psf / requests

A simple, yet elegant, HTTP library.
https://requests.readthedocs.io/en/latest/
Apache License 2.0
52.23k stars 9.34k forks source link

params no longer work for non-http URLs #1879

Closed ibuildthecloud closed 10 years ago

ibuildthecloud commented 10 years ago

Commit b149be5d specifically removed all processing of non-HTTP(s) URLs. Previous to this commit if you were to pass in params, it would be appended to the URL. Since this commit it no longer works. Specifically I ran into this issue when running docker-py against a unix socket.

I understand that for non standard URLs you don't want to parse them. It seems that if the params were passed, then you should still append them. I'll gladly put in a pull request for this if it seems like a valid issue. A simple fix would be the below

        # Don't do any URL preparation for oddball schemes
        if ':' in url and not url.lower().startswith('http'):
            encoded_params = self._encode_params(params)
            if len(encoded_params) > 0:
                if url.find('?') == -1:
                    url = '%s?%s' % (url, encoded_params)
                else:
                    url = '%s&%s' % (url, encoded_params)
            self.url = url
            return
Lukasa commented 10 years ago

Hmmm.

Hmmm.

I'm very much on the fence here. On the one hand, I can see your argument, and it makes plenty of sense. There's definitely utility in being able to pass 'params' to requests and have it do the right thing.

Unfortunately, it violates the distinction we drew when we accepted that change. The nature of that case was to say "if this URL isn't for HTTP, we have no idea what it means and we shouldn't touch it". That's because the URL syntax is not very heavily specified. In particular, there's no well-defined syntax for how the query string should look. It is generally written as a set of kvps, like in HTTP, but RFC 3986 does not go into detail about how these should look, instead leaving them to be scheme specific.

With that in mind, I think I'm -0.5 on this. We just can't be sure that we're doing the right thing with parameters.

sigmavirus24 commented 10 years ago

@ibuildthecloud can you provide an example of the URL you were requesting with docker-py?

ibuildthecloud commented 10 years ago

Totally understand the perspective, but since the url and params are both specified by the users, it seems like acceptable behaviour. Basically requests is just saying, "I have no clue what this funky URL is, but you told me to add params, so I'll do the documented behaviour of adding ?k=v" If your funky non-standard URL is not compatible with params, then the user shouldn't pass them in.

The problem I specifically have is that between versions 2.0.1 and 2.1.0 this behaviour changed. So either requests should respect the old behaviour, or consumers of requests need to change. Specifically I need to get docker-py to change. But that change for them doesn't see to pretty IMO. docker-py can run against HTTP or a unix socket. So the code is relatively agnostic to which it is using. But asking them to change this means that every they use params they must do "if http: params={} else: urlencode(...)"

On Fri, Jan 24, 2014 at 3:24 PM, Cory Benfield notifications@github.comwrote:

Hmmm.

Hmmm.

I'm very much on the fence here. On the one hand, I can see your argument, and it makes plenty of sense. There's definitely utility in being able to pass 'params' to requests and have it do the right thing.

Unfortunately, it violates the distinction we drew when we accepted that change. The nature of that case was to say "if this URL isn't for HTTP, we have no idea what it means and we shouldn't touch it". That's because the URL syntax is not very heavily specified. In particular, there's no well-defined syntax for how the query string should look. It is generally written as a set of kvps, like in HTTP, but RFC 3986 does not go into detail https://tools.ietf.org/html/rfc3986#section-3.4 about how these should look, instead leaving them to be scheme specific.

With that in mind, I think I'm -0.5 on this. We just can't be sure that we're doing the right thing with parameters.

— Reply to this email directly or view it on GitHubhttps://github.com/kennethreitz/requests/issues/1879#issuecomment-33267892 .

ibuildthecloud commented 10 years ago

The URL looks like 'unix://var/run/docker.sock/v1.6/containers/json' and the code is passing in params so that the url becomes 'unix://var/run/docker.sock/v1.6/containers/json?all=1&limit=-1&trunc_cmd=0'

Lukasa commented 10 years ago

I absolutely see that problem, but I feel very uncomfortable with the behaviour of assuming a certain logic for params.

What are docker-py doing to route over the unix domain socket? At the very least they'll have to be mounting a Transport adapter.

sigmavirus24 commented 10 years ago

What are docker-py doing to route over the unix domain socket? At the very least they'll have to be mounting a Transport adapter.

I was thinking the same thing, but the Transport adapter doesn't handle params. That can only be handled by a pepared request object. I'm not sure they could pass on the params they need to tack on with the adapter in this case. Unless, you had another idea how to go about that @Lukasa, I think that may be a dead end. I agree though that the parameter handling should only be for HTTP(S) URLs since those are the only ones we are really positioned or likely to support.

docker-py could use URI templates to bypass having to use the params parameter to requests, but that would likely introduce a new dependency which they might not want.

Lukasa commented 10 years ago

What I was asking, actually, was whether they had a TransportAdapter subclass. If they do, it's not the end of the world to subclass/monkeypatch the PreparedRequest object.

ibuildthecloud commented 10 years ago

This is getting a bit past my knowledge of the internals of requests, but they seem to have extended requests.adapters.HTTPAdapter and created a UNIX socket one. When you create a docker Client object, that extends requests.Session and calls mount() to register the UNIXAdapter. You can look at the specific details at https://github.com/dotcloud/docker-py/blob/master/docker/unixconn/unixconn.py

So maybe this is a good thing, I wasn't aware that docker-py was doing specific logic to make unix sockets work. Perhaps there's a different way about this then that docker-py can do. I'll have to look further into docker-py or pull in one of the maintainers. Let me dig a bit into the internals and see what they are doing because obviously they are ignoring the appended query string when opening the actual socket, but I think the current issue is that the "GET /url" also doesn't have the query string. So I'm assuming there needs to be some way to get the params passed to requests into the GET line.

Lukasa commented 10 years ago

So, the issue is that as @sigmavirus24 mentioned, Requests handles parsing the URL about two layers higher than the Transport Adapter (in the PreparedRequest object). Certainly docker-py can work around this by playing about with that object (either monkeypatching it or using the explicit PreparedRequest flow), but the real meat of this issue is whether they should have to.

I remain as I was before, at -0.5. It feels like we'd be doing the wrong thing. I am, however, open to being convinced on this issue.

sigmavirus24 commented 10 years ago

I took a look to see how simple that would be for them to implement but they're code seems quite full of misdirection. That said, the way they're currently doing everything, I find it hard to believe they'll be up for using the explicit flow @Lukasa linked to.

Lukasa commented 10 years ago

So, million-dollar question: is it our job to un-break docker-py, or should we apologise for not spotting that this could happen but assert that we shouldn't change our current behaviour?

My worry is that unix: as the URI scheme actually doesn't seem to specify anything about how to encode the URI: at least, I can't find a doc. This isn't unexpected, since there's no requirement that unix domain sockets use HTTP as the application-layer protocol, they can do whatever the hell they want. This is a rare situation where the application-layer isn't specified in the name. For instance, LDAP appears to define an ldapi: scheme that is for use over unix domain sockets, and has defined semantics. Sadly, HTTP does not.

With that in mind, we can do a number of things.

  1. Assume that whenever the user points Requests to a unix domain socket they want to run HTTP over it, and behave accordingly. That's not totally unreasonable, but it runs the risk of breaking third-party libraries for other application layer protocols that may also want to use bare unix sockets through Requests. This is the reason we stopped parsing the URLs in the first place (see #1717).
  2. Allow users the possibility of shooting themselves in the foot by saying that we will always prepare params as though they're HTTP-like, even when we don't know what the hell you're doing (i.e. non https? URI).
  3. Don't change our current behaviour: apologise, but say that guessing is bad.

On balance, my order of preference is 3 -> 1 -> 2. I think 2 is dangerous, violates the Zen of Python and undoes the genuinely good change that was in #1717. I think that 1 is OK: it's not unreasonable to do that, but I'm convinced we'll break another third-party library. That might not be the end of the world, but it's not ideal either.

3 continues to be the argument that convinces me most. We made a decision that we should do a bit less guessing about what the user wants from us in situations we don't understand. I think that was a good decision, and I think we should stick with it. I can certainly be convinced towards 1, but right now I'm inclined towards doing 3.

sigmavirus24 commented 10 years ago

I'm in favor of 3. I just looked at docker-py to see if I could give @ibuildthecloud a hand in preparing a PR. That said, 1 is plenty dangerous in my opinion. Just because someone is using requests to talk to something does not mean we can assume it is not translating a PreparedRequest into some other format. I'm not convinced that it is reasonable (or not totally unreasonable) to assume that they're using HTTP/1.1.

ibuildthecloud commented 10 years ago

How about a fourth option. I totally agree that requests should treat non-HTTP URLs as opaque strings and allowing somebody to shoot themselves in the foot by appending ?k=v might not be desirable either. My hang up though is that if you use a non-HTTP URL the params argument to requests becomes useless and ignored. It's nice that you can plug in a TransportAdapter for a non-standard transport but the side effect now is that if your TransportAdapter works off of non-HTTP URLs the params argument can't be used anymore. Instead of effectively dropping the params argument, what if it was saved and passed onto the TransportAdapter to do with it what it wants. So the change I'm thinking is basically to add the below

    def prepare_url(self, url, params):
        """Prepares the given HTTP URL."""

        enc_params = self._encode_params(params)

        # Save params for non-HTTP URL based TransportAdapters
        self.params = params
        self.encoded_params = enc_params

So basically just save both the url and the params in the PreparedRequest. Now I already tried to modify docker-py to use this approach but then realized the logic to encode the params is in an internal method in PreparedRequest. So in order to not duplicate the logic (and not call an internal method), I though prepare_url could save the encoded params. Or move PreparedRequests._encode_params to be some non-internal util method.

sigmavirus24 commented 10 years ago

@ibuildthecloud the short answer is no.

The long answer is this: Transport Adapters only handle the transmission of requests and responses. Prepared Requests are the representation of a request ready to be sent. Sessions encapsulate the logic (and only the logic) of a user-agent session (lightly speaking) including handling of cookies, and determining which transport adapter to use to perform an interaction.

The pattern, is this: Each object only knows what it has to know about. Parameters are related to requests only, transport adapters should care not what parameters are sent. They should concern themselves only with processing a request and generating a response.

Perhaps the best solution for docker-py is to monkeypatch the PreparedRequest object's prepare_url method to not check the scheme of the URL. With that in mind, you will be duplicating some code unfortunately.

ibuildthecloud commented 10 years ago

Hmmm...

Who's responsibility is it to generate and write the line "GET /x?k=b" line in the HTTP request? Isn't that the transport that does that?

On Sun, Jan 26, 2014 at 2:42 PM, Ian Cordasco notifications@github.comwrote:

@ibuildthecloud https://github.com/ibuildthecloud the short answer is no.

The long answer is this: Transport Adapters only handle the transmission of requests and responses. Prepared Requests are the representation of a request ready to be sent. Sessions encapsulate the logic (and only the logic) of a user-agent session (lightly speaking) including handling of cookies, and determining which transport adapter to use to perform an interaction.

The pattern, is this: Each object only knows what it has to know about. Parameters are related to requests only, transport adapters should care not what parameters are sent. They should concern themselves only with processing a request and generating a response.

Perhaps the best solution for docker-py is to monkeypatch the PreparedRequest object's prepare_url method to not check the scheme of the URL. With that in mind, you will be duplicating some code unfortunately.

— Reply to this email directly or view it on GitHubhttps://github.com/kennethreitz/requests/issues/1879#issuecomment-33331429 .

Lukasa commented 10 years ago

Yes, but that job is trivial. It's not the transport's job to work out what the URL should be, just to put it in between the word GET and the word HTTP/1.1. It's the job of higher layers to work out what the URL should be.

ibuildthecloud commented 10 years ago

But for non-HTTP URLs you are already saying that requests isn't responsible for understanding the URL, so who is then? Can't there be a hook for this? Again, it doesn't seem right that params is just ignored and dropped. That means the caller to requests is not fully abstracted away from the TransportAdapter because in a non-HTTP URL based approach it changes the nature of the params argument.

ibuildthecloud commented 10 years ago

Additionally, its not trivial to generate "GET /x?k=b" In order to do that you must parse the URL, so it does require specific knowledge of the URL format to create the GET line

sigmavirus24 commented 10 years ago

Right. The transport adapter expects to only be talking to HTTP services so it is fair to make assumptions about the URL being passed to it. One of those assumptions is that the URL can be parsed to find the path. No such assumptions can be made by the HTTP adapter about the unix protocol. You would also note that we are not generating that line in requests or in the adapter but instead that takes place in urllib3.

Lukasa commented 10 years ago

You've misunderstood my argument, I think. =)

Requests isn't responsible for understanding URLs on non-HTTP schemes because Requests is a HTTP library. The HTTPAdapter is not the only part of Requests that understands HTTP: the whole Requests stack does. However, people want the freedom to use the Requests API on non-HTTP schemes, which Requests allows (I've written one for FTP, for example).

Allow me to break out Requests design. There are three layers: Request/PreparedRequest/Response, Session, TransportAdapter. Each of these knows about HTTP in a different way.

The first thinks in terms of individual HTTP messages. It builds up the structure for these messages, but in a way that can be mutated. Headers are a dictionary, URLs are components, etc. The PreparedRequest is an intermediate step that does some parsing of data as HTTP, but not much. These layers understand HTTP messages.

The Session layer thinks in terms of HTTP as state: in essence, it acts a bit like a user agent (though it isn't). It knows how to maintain cookies and when to apply them, and it provides room for persistent attributes. It understands the persistence layer of HTTP.

The TransportAdapter layer thinks about HTTP in terms of connections and connection pools. It knows how HTTP messages are structured on the wire, and how to handle chunked data (which is in effect an unusual way of representing HTTP bodies on the wire). This layer also handles TLS and connection maintenance. This layer understands how to serialize HTTP messages into their three basic components: request header, header block, body.

In this model, building URLs from their component structures is a job for the top layer. It knows how HTTP URLs work and provides convenience attributes for working with them. The Session object doesn't touch them at all, and all the TransportAdapter does is work out whether or not it needs to include the full URL in the request URI or whether it can omit everything before the path. Doing that is trivial: look for the first '/' character and split on that. This is not the same as parsing a URL.

For non HTTP URLs the user of Requests is responsible for understanding how to build the URL, because it's the only party involved that knows how.

ibuildthecloud commented 10 years ago

Okay, let me digest this a bit. Let me point out that what docker-py is doing is HTTP. The only difference is that instead of sending the request over a TCP socket, it is sending it over a UNIX socket. So were not talking about a non-HTTP service. The problem though is that the URL is overloaded in HTTP. It is used as both a means of describing the transport (IP host and TCP port or SSL) and the HTTP path to be put in the request line. So for docker-py a unix scheme URL is used because we are not looking for IP host and TCP port, but instead the file system path to the unix socket.

Would it make sense to have docker-py use the url format http+unix://socketpath/path?k=v to undicate that this is a http URL but has a custom transport. (I have no clue if the http+unix scheme will work with urllib and things, but I've seen that format used in other things)

sigmavirus24 commented 10 years ago

That scheme may work but the Transport Adapter would need to likely strip off the http+.

Also you're in luck because it seems urlparse will handle it well:

>>> import requests
>>> requests.compat
<module 'requests.compat' from '/usr/lib64/python2.7/site-packages/requests/compat.pyc'>
>>> requests.compat.urlparse
<function urlparse at 0x7f3d60b9ec80>
>>> requests.compat.urlparse('http+unix://host/path')
ParseResult(scheme='http+unix', netloc='host', path='/path', params='', query='', fragment='')
ibuildthecloud commented 10 years ago

The UnixAdapter in docker-py already messes with the url to determine the correct path_url and socket path. I actually quickly modified docker-py and http+unix:// seemed to work. I'll go down the path of putting in a PR for docker-py to use http+unix. So hopefully that will be accepted and I'll stop bothering you.

I still think its a gap that for not http scheme URL you just toss params in the trash. There should be a nice hook to provide custom logic to deal with params. But I'm pretty satisfied that http+unix seems to work.

sigmavirus24 commented 10 years ago

There should be a nice hook to provide custom logic to deal with params.

I think it's fairly obvious that @Lukasa and I disagree with this sentiment. I'm not sure we can say it enough times but requests is an HTTP library. While docker-py is using it to perform HTTP over a unix socket, not every client using it on a Unix socket will be doing HTTP necessarily. There is no reasoning beyond some extraordinary uses for us to enable a hook or any other simpler means when there are already documented ways around this.

Frankly, it is far from impossible to accomplish but just because we make it possible does not mean we should encourage it by making it simple.

I'm glad you've found one way around it and are working on a pull request to docker-py.

Cheers!

ibuildthecloud commented 10 years ago

Thanks for your help and quick replies. I should have clarified from the beginning that I was talking about HTTP over a custom transport. It never even occurred to me that requests could be used for non-HTTP.

On Sun, Jan 26, 2014 at 5:05 PM, Ian Cordasco notifications@github.comwrote:

There should be a nice hook to provide custom logic to deal with params.

I think it's fairly obvious that @Lukasa https://github.com/Lukasa and I disagree with this sentiment. I'm not sure we can say it enough times but requests is an HTTP library. While docker-py is using it to perform HTTP over a unix socket, not every client using it on a Unix socket will be doing HTTP necessarily. There is no reasoning beyond some extraordinary uses for us to enable a hook or any other simpler means when there are already documented ways around this.

Frankly, it is far from impossible to accomplish but just because we make it possible does not mean we should encourage it by making it simple.

I'm glad you've found one way around it and are working on a pull request to docker-py.

Cheers!

— Reply to this email directly or view it on GitHubhttps://github.com/kennethreitz/requests/issues/1879#issuecomment-33335146 .

sigmavirus24 commented 10 years ago

@ibuildthecloud we love to help when possible.

It never even occurred to me that requests could be used for non-HTTP.

As @Lukasa mentioned, he wrote an FTP adapter. User's can implement whatever backend they want with Transport Adapters. They do not have to use urllib3. They can also process a Prepared Request however they like. We cannot facilitate all of their needs even when they're performing HTTP over a different transport.