pallets / werkzeug

The comprehensive WSGI web application library.
https://werkzeug.palletsprojects.com
BSD 3-Clause "New" or "Revised" License
6.65k stars 1.73k forks source link

test client adds quotes around cookie value #1060

Closed treadup closed 1 year ago

treadup commented 7 years ago

The Werkzeug test Client adds double quotest around a cookie value. This means that if you set a cookie using set_cookie and then get the value of the cookie using the value property of the cookie you will get back a different value from the one you set.

I would expect the same value that I set with set_cookie to be returned when using the cookie.value property. But perhaps I am missing something here? I have added a test case that demonstrates the issue.

import unittest
from unittest import TestCase

from werkzeug.test import Client
from werkzeug.testapp import test_app
from werkzeug.wrappers import BaseResponse

class WerkzeugCookieTest(TestCase):

    def get_cookie_from_client(self, cookie_name, client):
        for cookie in client.cookie_jar:
            if cookie.name == cookie_name:
                return cookie

        return None

    def test_cookie(self):
        c = Client(test_app, BaseResponse)

        expected_cookie_contents = "This is the contents of the foo cookie."
        c.set_cookie('localhost', 'foo', expected_cookie_contents)

        #    code.interact(local=locals())

        actual_cookie = self.get_cookie_from_client('foo', c)
        self.assertEqual(expected_cookie_contents, actual_cookie.value)

if __name__ == '__main__':
    unittest.main()

When running the above test I get the following failure.

F
======================================================================
FAIL: test_cookie (__main__.WerkzeugCookieTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_werkzeug_cookie.py", line 27, in test_cookie
    self.assertEqual(expected_cookie_contents, actual_cookie.value)
AssertionError: 'This is the contents of the foo cookie.' != '"This is the contents of the foo cookie."'

----------------------------------------------------------------------
Ran 1 test in 0.001s
tuukkamustonen commented 6 years ago

Seems like Client#set_cookie() calls dump_cookie() (in https://github.com/pallets/werkzeug/blob/master/werkzeug/test.py#L701-L702) which sets quotes on value (in https://github.com/pallets/werkzeug/blob/master/werkzeug/http.py#L1073).

It gets into CookieJar based on the quoted header in https://github.com/pallets/werkzeug/blob/master/werkzeug/test.py#L705

So, to read it, you need to call werkzeug.http.parse_cookie (in https://github.com/pallets/werkzeug/blob/master/werkzeug/http.py#L965), passing it the Set-Cookie header value as in:

    ...

    actual_cookie = self.get_cookie_from_client('foo', c)

    cookie_header = actual_cookie.name + '=' + actual_cookie.value
    cookie_dict = parse_cookie(cookie_header)

    self.assertEqual(expected_cookie_contents, cookie_dict['foo'])

Quotes are only set when needed (see https://github.com/pallets/werkzeug/blob/master/werkzeug/_internal.py#L29-L31).

tuukkamustonen commented 6 years ago

@davidism @ThiefMaster I would consider this answered (see my reply above). Close?

mahmoud commented 4 years ago

I know this thread is old, but still. This is more than a "question". Is the recommended pattern really... reassemble the cookie header and reparse it?

If so, it's insufficient. This issue is not limited to a poor TestClient/CookieJar API. Using the test client to generate a request with cookies and passing it through werkzeug itself results in the same double-quoting issue. And in my case it breaks the HMAC with secure-cookie.

Kind of surprising it's been around this long. Might have to open another issue.

mahmoud commented 4 years ago

(securecookie values end up quoted because securecookie uses a URL-ish style, with query parameters, and ? is not one of the _legal_cookie_chars, probably per the spec)

mahmoud commented 4 years ago

The environ ends up looking like:

{
'HTTP_COOKIE': 'cookie="\\"W7XGXxmUjl4kbkE0TWaFo4Oth50=?userid=NjAyNDQ3NA==&username=IlNsYXBvcnRlIg==\\""'
...
}
mahmoud commented 4 years ago

So test.Client uses cookielib/http.cookiejar in a really gnarly way. CookieJar's ad-hoc parsers interpret Werkzeug-style cookies as Netscape-style, so the values don't get quotes stripped off of them. Hence, the extra wrapping quotes.

Bit of trivia, if the cookie had a version attribute, the cookie with be seen as RFC2109, and had the quotes stripped. But that's neither here nor there, since no one would want to emit RFC2109 cookies these days.

davidism commented 4 years ago

Happy to consider a PR fixing this, but I don't think I understand the issue enough right now to fix it myself or mentor someone else at a sprint.

Maybe it would help to outline the expected behavior on both sides. How should client.set_cookie work? We're using cookie jar internally, and I don't think we can change that, so I'm not clear on how the custom get_cookie_from_client function should work if they're not expected to actually do the reverse of what set_cookie did. Is there actually an issue with the cookie value as seen by the WSGI application being tested, as opposed to just this custom function?

mahmoud commented 4 years ago

@davidism Thanks for reopening.

Is there actually an issue with the cookie value as seen by the WSGI application being tested, as opposed to just this custom function?

Yes, I'm seeing this behavior after calling set_cookie with the output of SecureCookie.serialize, then using the test client to issue a request to the WSGI app under test. The WSGI app fails to load the cookie because of the double quotes.

davidism commented 4 years ago

So is this a problem with secure cookie's implementation then? Is it not parsing valid cookie values? I'm just unclear what everything's supposed to do instead of quoting when quoting is needed.

treadup commented 4 years ago

I did not dig into this enough when it happened. And now I am fuzzy on the details. I think, but I am not sure, that I was trying to test a working flask application when I ran into this issue. In other words I had a flask application that used cookies and was working. But there was no test that used cookies. When I tried adding a test I ran into this problem.

If you take the original unit test above and change the cookie value to not contain spaces then the unit test will pass. I think I solved my issue by removing spaces from my cookie value. And then I stopped digging into the problem.

import unittest
from unittest import TestCase

from werkzeug.test import Client
from werkzeug.testapp import test_app
from werkzeug.wrappers import BaseResponse

class WerkzeugCookieTest(TestCase):

    def get_cookie_from_client(self, cookie_name, client):
        for cookie in client.cookie_jar:
            if cookie.name == cookie_name:
                return cookie

        return None

    def test_cookie(self):
        c = Client(test_app, BaseResponse)

        expected_cookie_contents = "No_spaces_in_cookie."
        c.set_cookie('localhost', 'foo', expected_cookie_contents)

        #    code.interact(local=locals())

        actual_cookie = self.get_cookie_from_client('foo', c)
        self.assertEqual(expected_cookie_contents, actual_cookie.value)

if __name__ == '__main__':
    unittest.main()
treadup commented 4 years ago

The Client.set_cookie method internally uses the werkzeug.http.dump_cookie function.

treadup commented 4 years ago

Playing around a bit with dump_cookie I can see that it is adding quotes around the cookie value if the cookie value contains spaces.

>>> from werkzeug.http import dump_cookie
>>> dump_cookie('foo',  "This is the contents of the foo cookie.")
'foo="This is the contents of the foo cookie."; Path=/'
>>> dump_cookie('bar',  "ThisCookieValueContainsNoSpaces.")
'bar=ThisCookieValueContainsNoSpaces.; Path=/'
davidism commented 4 years ago

dump_cookie is not the problem here, it's doing the correct thing by quoting values with unsafe characters. Any test code that wants to inspect the internals directly needs to reverse that and do parse_cookie.

I'm not clear why you were trying to inspect the internal results of set_cookie either, rather than testing a request that would read the cookie. In @mahmoud's case it seems to point to secure-cookie not doing the right thing. I'm trying to understand what is expected to change in Werkzeug.

treadup commented 4 years ago

The test client already provides a set_cookie method. What I was trying to do by poking around in the internals was to write a get_cookie method. That way you could use the test client to do a GET request against an endpoint and then check if cookies were being set correctly by just querying the test client.

The set_cookie method on the test client handles the case when you want to make a HTTP request with a given cookie set.

It would be nice if the test client also had a get_cookie method that could be used to get the value of a cookie after you had made a HTTP request.

In a browser you can inspect the cookies that are currently set. This is useful when you are manually testing things.

The get_cookie_from_client method above was my attempt at writing the get_cookie method.

mahmoud commented 4 years ago

@davidism not quite!

So dump_cookie has its opinion on what characters need to be quoted (without a citation), and it includes whitespace (per @treadup) and ?, per securecookie's usage. The most modern cookie RFC doesn't mention that ? needs to be quoted. To my knowledge, this is the recommended RFC for producing cookies, which brings me to the next point.

CookieJar is mostly useful for broad, backwards-compatible cookie handling. With a purely internal use like this, I think it might be overkill.

So to recap, my understanding:

TLDR CookieJar interprets cookies generated by dump_cookie to be a different version of cookie than what was probably intended. It works most of the time, unless you use one of a particular set of characters. This small bug comes into focus due to the serialization format used by securecookie, a close friend of werkzeug's.

davidism commented 4 years ago

I vaguely remember dealing with some other issues around the internal cookie jar instance, and I remember being frustrated and confused by the cookie jar structure and implementation, as well as Werkzeug's circuitous use of it. So I'm not necessarily against removing cookie jar and/or improving this API.

Assuming we replace it with set_cookie and get_cookie methods that use some other internal data structure, how should the data be stored instead? CookieJar does provide handling of domains, paths, etc. (not sure how well).

There have been questions about the behavior provided by CookieJar before. For example, #1680 is currently open and wants to match the cookie domain with the host of the test request. I'm on the fence about whether that actually makes sense for tests, since it's basically testing the browser's sandboxing of cookies. Regardless of that decision, it indicates that people are probably using the cookie jar, for better or worse.

treadup commented 4 years ago

Well I'm not sure if set_cookie and get_cookie cover all the use cases. For example using the CookieJar you can iterate over all the cookies that are currently set. That is something that would not be covered by just set_cookie and get_cookie.

davidism commented 4 years ago

That's what I'm asking: what should the internal data structure be if CookieJar isn't working but people still want what it provides?

Note that client.cookie_jar, or the fact that it is a CookieJar, is not documented anywhere. So while I agree that getting all the cookies may be nice for some specific types of tests, as well as filtering by domain, etc., this all appears to be an internal implementation detail right now.

mahmoud commented 4 years ago

Right. Note that in this case whatever is using it is actually the same class on the server side of the WSGI application during a test run. I don't know when I'll make a reproducing case, but repro steps:

  1. Manually serialize SecureCookie
  2. Add to client with client.set_cookie('local', 'session', serialized_value)
  3. Send a request to the WSGI app

And the WSGI app will fail to load the securecookie because the extra quotes causes the HMAC to fail.

treadup commented 4 years ago

I'm not really sure what a good data structure would be. But taking a look at https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie it almost feels like each cookie is a row in a relational database table.

We have a single table called COOKIE that has the columns COOKIE_NAME, COOKIE_VALUE, SECURE, HTTP_ONLY, MAX_AGE, etc. One column for each attribute.

I'm not saying that you should use say SQLite to store this but if you did it would allow users to perform any kind of query against the cookie table they wanted.

treadup commented 4 years ago

Perhaps a plain association list. The key could be the cookie_name or perhaps (domain, cookie_name) tuple. The value could be either a tuple or a dict containing all of the cookie attributes.

cAtaman commented 3 years ago

I'll be working on this

davidism commented 1 year ago

OK, I started working on this again, and while I think a refactor is needed, I don't think it's going to "solve" any of the issues here.

Our current use of CookieJar is really convoluted. CookieJar is designed for urllib.request, not for a WSGI environ. We jump through a bunch of hoops to adapt it, there's a lot of encoding back and forth. So refactoring that can remove complexity.

The issues here are more about understanding the difference between client and application, rather than an issue with the current implementation. The test client is a client, it's taking the place of a browser or script making requests. So the cookies it sees and works with are the encoded values, the format that the HTTP Cookie spec requires for the header value. That means special characters are escaped, quotes are added, etc. If you're looking at the items in client.cookie_jar, you need to test against what you expect the client to have, not the decoded value that was originally set in the app code. And you need to set values that you expect the client to set.

It's a confusing API right now, because there's public API for set_cookie and delete_cookie, which match how the app works with cookies rather than the browser, but there's no get_cookie (and cookie_jar isn't public) to do the same. And should get_cookie return the original value, or the value that the browser will see? And really, most cookies should be opaque to the browser anyway, given their lack of security otherwise, so the tests should really be about setting a cookie in one request then getting it in another, not looking at it from the client side.

aenglander commented 1 year ago

I would argue that the current API is appropriate for a test client. Actual clients should not be interacting with cookies, so getting a cookie would not represent normal behavior. Reading the value of a cookie and making assertions is a poor test. It just proves that the cookie has a value and not that the call that requires the cookie in a dilutive transaction will use it properly. I see checking the value of a cookie in a test the same as checking the value of a "private" object attribute. It's a poor way to test as it does not test that the code actually is working. I can definitely see the value in setting a cookie and deleting a cookie to ensure the Flask app properly handles an invalid or missing cookie respectively.

davidism commented 1 year ago

2637 removes use of http.cookiejar, adds a get_cookie method, and adds decoded_key and decoded_value attributes to the returned cookie objects.