python-hyper / hyper

HTTP/2 for Python.
http://hyper.rtfd.org/en/latest/
MIT License
1.05k stars 191 forks source link

Hyper doesn't print out any cookies. #409

Open BarryThrill opened 5 years ago

BarryThrill commented 5 years ago

Hello guys! I am using the latest development version of Hyper. I have been trying for hours to be able to print out cookies from using Hyper adapt. I have created a code which looks like:

s = requests.session()
s.mount('https://', HTTP20Adapter()) #As soon as I comment this row (Meaning that this code will not be in use then it works with getting cookies) however as soon as you use this function then you will not receive any cookies
s.headers = OrderedDict()

url = 'https://github.com/'

headers = {
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36',
    'accept': "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",
    'accept-language': 'sv-SE,sv;q=0.9,en-US;q=0.8,en;q=0.7,de;q=0.6'
}

r = s.get(url, headers=headers, timeout=5, verify=False)

print(r.cookies)

Which gives me a output of:

<RequestsCookieJar[]>

However if I change

r = s.get(url, to r = requests.get(url, then I do get output of

<RequestsCookieJar[<Cookie _octo=GH1.1.1584727450.1556359694 for .github.com/>, <Cookie logged_in=no for .github.com/>, , ]>

So the question is, why does it happen like this?

dhdavvie commented 5 years ago

So after some quick digging, here is what I could pull up: This line is where the cookies are extracted. The extract_cookies_to_jar function in requests is expecting 2 things:
1) That response has the attribute _original_response. The current implementation only gives an _original_response to response.raw. 2) That response._original_message.msg contains a list of tuples in the format (header, value).

The current implementation can be hacked in the following way to somewhat get the desired result:

        class FakeOriginalResponse(object):  # pragma: no cover
            def __init__(self, headers):
                self._headers = headers

            def get_all(self, name, default=None):
                values = []

                for n, v in self._headers:
                    if n.lower() == name.lower(): # changed n to n.lower() to be case insensitive
                        values.append(v)

                if not values:
                    return default

                return values

            def getheaders(self, name):
                return self.get_all(name, [])

        response.raw._original_response = orig = FakeOriginalResponse(None)
        orig.version = 20
        orig.status = resp.status
        orig.reason = resp.reason
        # use list comprehension to reformat the headers into a list of tuples. 
        orig.msg = FakeOriginalResponse([(k,v) for k, v in response.headers.items()]) 

        # give response this _original_response attribute
        response._original_response = orig
        extract_cookies_to_jar(response.cookies, request, response)

With the above hack, the cookie jar gets one of the cookies: <RequestsCookieJar[<Cookie _gh_sess=dE5H..572446 for github.com/>]>

I imagine it is not getting the other cookies due to the change in nature between HTTP/1.1 and HTTP/2.0, and so there should be persistence of cookies for a stream/connection to account for this potentially. Will continue digging

For reference, I am comparing the behavior with how the 'requests.adapters.HTTPAdapter` behaves.