psf / requests

A simple, yet elegant, HTTP library.
https://requests.readthedocs.io/en/latest/
Apache License 2.0
52.1k stars 9.32k forks source link

"PreparedRequest.prepare_cookies" can only be called once (and "succeeds" silently if called again) #2532

Closed smiley closed 9 years ago

smiley commented 9 years ago

I'm writing an adapter that interfaces with a CDN, and it requires that I "sign" my requests with a cookie, but it doesn't send the cookie back in Set-Cookie.

I manage to send it on the first request, but then the site behind the CDN does send a Set-Cookie header (this is expected), and prepare_cookies called by the Session works its magic. The problem is, due to how cookielib is written, if a Cookie header was already composed, it won't be re-composed: (Even if cookies have changed between calls)

# cookielib.py:CookieJar.add_cookie_header (starts at 1312)
# ...
attrs = self._cookie_attrs(cookies)
if attrs:
    if not request.has_header("Cookie"):
        request.add_unredirected_header(
          "Cookie", "; ".join(attrs))
#...

This causes a problem when any request past the first one on the same Session object renders my adapter worthless, as authentication fails and attempts to properly add cookies fail silently:

def _prepare_request(self, request):
    """
    :type request: requests.PreparedRequest
    """
    request.prepare_cookies({self.COOKIE_NAME: self._get_authorization_cookie(request.url)})

request._cookies is modified properly, but request.headers['Cookie'] is still its former self.

If fixing this isn't possible (it is a bundled library after all), maybe requests could delete the header before calling CookieJar.add_cookie_header?

sigmavirus24 commented 9 years ago

If fixing this isn't possible (it is a bundled library after all), maybe requests could delete the header before calling CookieJar.add_cookie_header?

I assume this is happening because of this line. And that's why you quoted code from cookielib a standard library. You want us to somehow patch the standard library locally?

Here's an alternative fix: Why aren't you constructing a new request, and preparing it with the new cookie header instead of trying to reuse an old prepared request?

smiley commented 9 years ago

No, I'm suggesting that you can report the issue to the people developing cookielib (to get this fixed or documented in future Python releases), and do one of these things:

  1. Reject any future calls to prepare_cookies (have a local boolean for marking this was already called?)
  2. Merge an existing self._cookies value with the new one you're creating, clear self.headers['Cookie'] and call the correct prepare function in cookielib, forcing it to re-generate the header
  3. Allow adapters to receive Request objects before they become PreparedRequest objects, so they could modify them correctly.

If you choose (1), you'll have to provide another way for an adapter to change cookies in a request.

And I'm not constructing a new request, because as an adapter I get a PreparedRequest and (I think) am expected to pass it on.

EDIT: I think (2) is the best option, since it's transparent and requires no public interface changes. I can make that change & open a PR if you agree.

sigmavirus24 commented 9 years ago

I'm suggesting that you can report the issue to the people developing cookielib (to get this fixed or documented in future Python releases)

Why aren't you reporting it to them? bugs.python.org accepts reports from everyone, and you're the first person I've found to encounter this particular problem. As you're the person with the main interest in seeing it fixed, you should be reporting it.

Reject any future calls to prepare_cookies (have a local boolean for marking this was already called?)

I would rather document the fact of this behaviour than break a behaviour (by introducing an exception) that people may be expecting to just work.

Merge an existing self._cookies value with the new one you're creating, clear self.headers['Cookie'] and call the correct prepare function in cookielib, forcing it to re-generate the header

Again, this is backwards incompatible. I suspect it's potentially desirable, but given the fact that we follow SemVer, it isn't plausible until 3.0.0. That said, you could just as easily do:

prepared_request.headers.pop('Cookie', None)  # To delete it unconditionally
prepared_request.prepare_cookies(...)

In your adapter.

Allow adapters to receive Request objects before they become PreparedRequest objects, so they could modify them correctly.

That doesn't fit the workflow that we have had established for 2 major cycles and which no one else has had an issue with. The current flow fits the abstraction well. Adapters are not there to prepare requests but instead to handle requests which are ready to be made. If adapters decide to modify a prepared request, that's fine.

And I'm not constructing a new request, because as an adapter I get a PreparedRequest and (I think) am expected to pass it on.

I don't quite fully understand your use case because, quite frankly, I've never seen someone use transport adapters this way. That said, since you're already toying with the prepared request, you may as well just do what I already described above (destroy the existing Cookie header, and regenerate it).

Nothing should blow up if you do that. It will also cover all the existing versions of Python that have the existing cookielib logic in it (and any future version that may potentially not have that logic).

smiley commented 9 years ago

Why aren't you reporting it to them? bugs.python.org accepts reports from everyone, and you're the first person I've found to encounter this particular problem. As you're the person with the main interest in seeing it fixed, you should be reporting it.

I didn't know bugs.python.org accepted bug reports for cookielib, and frankly, I was searching around for the main repository for it, but couldn't find it. Since requests is so widely-used & well-established, I assumed you found bugs before and reported them directly to the developer, so you might know who that is.

Again, this is backwards incompatible. [...]

It'd break a quirk that prevented a potential feature from ever working. (having more than one module -- not just adapters -- call prepare_cookies with required cookies)

[...] That said, you could just as easily do: [...]

That would break any existing cookies, like a session ID used by the website served by the CDN.

I don't quite fully understand your use case because, quite frankly, I've never seen someone use transport adapters this way. That said, since you're already toying with the prepared request, you may as well just do what I already described above (destroy the existing Cookie header, and regenerate it).

I'm essentially implementing anorov/cloudflare-scrape for a different CDN. That said, he recreates a Request object from the given PreparedRequest semi-manually. Right now I'm working around this like so:

def _add_auth_cookie(self, request):
    auth_cookie = self._get_authorization_cookie(request.url)

    if 'Cookie' in request.headers:
        old_cookies = request.headers['Cookie']
        all_cookies = '; '.join([old_cookies, "{0}={1}".format(self.COOKIE_NAME, auth_cookie)])
        request.headers['Cookie'] = all_cookies
    else:
        request.prepare_cookies({self.COOKIE_NAME: auth_cookie})

But fine, I'll continue using my workaround. At the very least you should document this in prepare_cookies' docstring. It's the only "prepare_[...]" method that can only be called once and it's not clear at all.

sigmavirus24 commented 9 years ago

I assume you're talking about this method in @Anorov's project. If you're doing something similar to that, I'm confused as to why you're re-using the request. You're making an entirely new request and since you're taking control at the adapter level, this is a concession you have to make.

That said, any updates to the documentation are welcome.

smiley commented 9 years ago

I'm making a separate new request to the CDN itself to authenticate, and then I'm using that cookie on all of my future requests. The problem here is that at no point is the cookie set by an HTTP request, but by local JS.

Anorov commented 9 years ago

I have no remarks about prepare_cookies specifically, however...

Generally speaking, I found the adapter API for sessions and cookie handling difficult to work with, as you can tell by some of the odd things I'm doing in my own code. Using the private-sounding _cookies attribute feels icky, and having to pass the cookie from the first response so I can artificially merge it into the headers of the second response (rather than a simple self.cookies.set(...) or the like) is irritating.

There seem to be some disconnects between the adapter API and the concept of a Requests session (or even just an HTTP session). It's very possible such a connection was never intended in the first place, but it is frustrating nonetheless. If there is a way of refactoring my code to make it more idiomatic I would greatly appreciate any suggestions and advice.

@smiley I might be interested in merging this functionality into cloudflare-scrape. I assume this is for a competing CDN like Incapsula. Feel free to email me.

sigmavirus24 commented 9 years ago

Yeah, I wasn't really going to bring up the fact that adapters aren't the right place for this logic. That said, I didn't take very much time to try to understand what you were doing with the code, so giving you a way to refactor it isn't quite possible for me at the moment. I would suspect that subclassing a session would be much cleaner and simpler for you, especially since I noticed that you're calling requests.get at one point inside an adapter method.

That aside, I'd be happy to discuss this on an issue on cloudflare-scrape or via email. This issue, I believe, can be closed.