Open jvanasco opened 6 years ago
Thanks for this @jvanasco
I had started hacking on a branch that would make the domain and path required with our cookie jar and probably provide a better interface to all of this in general. Sadly, I think that got lost in the suffle and I'd forgotten it. The API design change needs to be made for v3.0 to make this more tenable.
Making the domain and path required would be much easier than the blacklist idea I was stubbing out (nowhere near a PR, just some tests on the general logic working before I posted here).
If that is required, IMHO, it would make sense to deprecate the 'cookies' argument to requests.get()
(lose the docs, print a warning) and change the docs around cookiejars and Session()
to note the impending change.
Our unit tests just tossed a dict of cookies into requests.get()
, because that's what the docs did. It took an embarrassing amount of time to realize I had successfully fixed an edge case we added a test for, and the problem was in the test/requests. Then it took looking at the code and a history of tickets to realize that was a bad idea and requests didn't do all the wonderfully brilliant things I expected it to (because it does all those wonderfully brilliant things everywhere else).
@jvanasco I completely agree our behaviour is far from wonderfully brilliant like the rest of the project. That said, I think there's a difference between the following:
requests.request(METHOD, url, cookies=key_value_pair_cookies)
And
session.cookies.set(key, value)
The first is a convenience that I think we can reasonably keep around. Underneath the functional API is a Session. If we have the session treat that dictionary as calling .set(key, value, domain=parsed_domain, path=parsed_path)
we can probably then simplify our handling of cookies and the cookie-jar will respect deletions. I think that's a potentially far better solution than deprecating it altogether.
The second is just wrong. ;)
You are correct - parsing the cookie domain/path out of the submitted url would be a much better idea than trashing it, but you'll still end up with issues over the potential of a leading dot in the domain (and ugh, the path, but let's not go there yet). This can be infuriating to debug if a problem pops up from it.
For example, consider making a request to example.com
with a foo
cookie:
requests.get('http://example.com/path/to/url', cookies={'foo': 'bar'})
let's also assume the path is just set for the root /
(because it gets even more complicated otherwise)
The server might be configured to send the SetCookie header for foo
with any of three options for the domain:
The RFCs and browser behavior on all this were switching around for a few years too on what no domain should imply, or the behavior of the leading dot. While everything is settled right now, there are still a lot of legacy clients/servers out there, and the current requests codebase doesn't seem to standardize on anything.
So it could be represented by two ways of setting cookies in the jar.
jar.set('foo', 'bar', domain='example.com')
jar.set('foo', 'bar', domain='.example.com')
If the server sends an update/clearout for the cookie value, it's possible that requests will have guessed wrong. While requests
sends some data to the underlying cookielib
functionality from the standard library over the initial dot or domain being set, these are treated as unique cookies and must be deleted/updated independently.
We just updated/unified all our python and js cookie libraries. I had the "pleasure" of reconciling a bunch of issues around how differently libraries & browsers were handling the leading domain dot differently. (sidenote: My lazy solution on the server side was to migrate everything to specifying the domain with a leading dot, but also killing cookies with both leading dot and not. i hate cookies now. )
I found a few variations of this "implementation detail" in other reports, but I believe this particular use case is worth addressing.
If a URL is retrieved with either a dict or unqualified cookie jar, the deletions are ignored.
or
This can create endless redirect loops if the server's response to a resource/url combination is to set a "delete cookie", which requests can not handle. Aside from failing tests and puzzled developers, this can trigger abuse flags with 3rd party api services.
Expected Result
If the server sends a cookie deletion value, it should be respected and not sent to the server.
Actual Result
passing a 'cookies=" dict to
requests.get()
creates a wrapped session with the default cookiejar. the current implementation just deletes any matching cookie for the domain, which fails because the cookie was set as a default (not for that domain).Potential Way To Fix
A potential fix is to extend the RequestsCookieJar to use a blacklist when domain-less defaults are provided.
When a SetCookie deletion is encountered and the cookiejar contains domain-less cookies:
When a request is made, filter the defaults against active blacklist values.
When a SetCookie update is made, remove it from the blacklist (or update the blacklist's to note it as invalid until the cookie's expiry date)
Reproduction Steps
System Information
"version": "2.18.4"