psf / requests

A simple, yet elegant, HTTP library.
https://requests.readthedocs.io/en/latest/
Apache License 2.0
52.09k stars 9.31k forks source link

parse_header_links fails for momento link headers #2707

Closed jayvdb closed 8 years ago

jayvdb commented 9 years ago

The link header for http://ws-dl-05.cs.odu.edu/demo/index.php/Tyrion_Lannister is not being parsed correctly by parse_header_links, and that function doesnt look completely RFC 5988 compliant.

>>> r = requests.get('http://ws-dl-05.cs.odu.edu/demo/index.php/Tyrion_Lannister')
>>> r.headers['link']
'<http://ws-dl-05.cs.odu.edu/demo/index.php/Tyrion_Lannister>; rel="original latest-version",<http://ws-dl-05.cs.odu.edu/demo/index.php/Special:TimeGate/Tyrion_Lannister>; rel="timegate",<http://ws-dl-05.cs.odu.edu/demo/index.php/Special:TimeMap/Tyrion_Lannister>; rel="timemap"; type="application/link-format"; from="Mon, 23 Apr 2007 20:26:15 GMT"; until="Fri, 06 Sep 2013 17:19:06 GMT",<http://ws-dl-05.cs.odu.edu/demo/index.php?title=Tyrion_Lannister&oldid=1714>; rel="first memento"; datetime="Mon, 23 Apr 2007 20:26:15 GMT",<http://ws-dl-05.cs.odu.edu/demo/index.php?title=Tyrion_Lannister&oldid=107222>; rel="last memento"; datetime="Fri, 06 Sep 2013 17:19:06 GMT"'

>>> pp = pprint.PrettyPrinter(indent=4)
>>> pp.pprint(r.links)
{   '06 Sep 2013 17:19:06 GMT': {   'url': '06 Sep 2013 17:19:06 GMT'},
    '23 Apr 2007 20:26:15 GMT': {   'url': '23 Apr 2007 20:26:15 GMT'},
    'first memento': {   'datetime': 'Mon',
                         'rel': 'first memento',
                         'url': 'http://ws-dl-05.cs.odu.edu/demo/index.php?title=Tyrion_Lannister&oldid=1714'},
    'last memento': {   'datetime': 'Fri',
                        'rel': 'last memento',
                        'url': 'http://ws-dl-05.cs.odu.edu/demo/index.php?title=Tyrion_Lannister&oldid=107222'},
    'original latest-version': {   'rel': 'original latest-version',
                                   'url': 'http://ws-dl-05.cs.odu.edu/demo/index.php/Tyrion_Lannister'},
    'timegate': {   'rel': 'timegate',
                    'url': 'http://ws-dl-05.cs.odu.edu/demo/index.php/Special:TimeGate/Tyrion_Lannister'},
    'timemap': {   'from': 'Mon',
                   'rel': 'timemap',
                   'type': 'application/link-format',
                   'url': 'http://ws-dl-05.cs.odu.edu/demo/index.php/Special:TimeMap/Tyrion_Lannister'}}

Oddly this was the subject of #2250, as it appears the comma's are still causing the problem, so it may be that some corner cases still need to be handled better, and some unit tests written to confirm the problem is actually solved. Happy to help with either.

Lukasa commented 9 years ago

I'm not really surprised that the links parser isn't perfect. Contributions are welcome to improve it: otherwise, it'll go on our backlog until one of us has time to take a swing at doing it properly.

sigmavirus24 commented 9 years ago

And to be clear we explicitly aren't 5988 complaint because Kenneth decided he did not want to handle the case where one rel could have multiple links. There's an effort to add that to the requests-toolbelt though.

daftshady commented 8 years ago

This has already been fixed by https://github.com/kennethreitz/requests/pull/2271.

jayvdb commented 8 years ago

Indeed. I cant see any problems with the output using 2.11.1, with that website.

>>> import requests, pprint
>>> pp = pprint.PrettyPrinter(indent=4)
>>> requests.__version__
'2.11.1'
>>> r = requests.get('http://ws-dl-05.cs.odu.edu/demo/index.php/Tyrion_Lannister')
>>> pp.pprint(r.headers['link'])
('<http://ws-dl-05.cs.odu.edu/demo/index.php/Tyrion_Lannister>; rel="original '
 'latest-version",<http://ws-dl-05.cs.odu.edu/demo/index.php/Special:TimeGate/Tyrion_Lannister>; '
 'rel="timegate",<http://ws-dl-05.cs.odu.edu/demo/index.php/Special:TimeMap/Tyrion_Lannister>; '
 'rel="timemap"; type="application/link-format"; from="Mon, 23 Apr 2007 '
 '20:26:15 GMT"; until="Fri, 06 Sep 2013 17:19:06 '
 'GMT",<http://ws-dl-05.cs.odu.edu/demo/index.php?title=Tyrion_Lannister&oldid=1714>; '
 'rel="first memento"; datetime="Mon, 23 Apr 2007 20:26:15 '
 'GMT",<http://ws-dl-05.cs.odu.edu/demo/index.php?title=Tyrion_Lannister&oldid=107222>; '
 'rel="last memento"; datetime="Fri, 06 Sep 2013 17:19:06 GMT"')
>>> pp.pprint(r.links)
{   'first memento': {   'datetime': 'Mon, 23 Apr 2007 20:26:15 GMT',
                         'rel': 'first memento',
                         'url': 'http://ws-dl-05.cs.odu.edu/demo/index.php?title=Tyrion_Lannister&oldid=1714'},
    'last memento': {   'datetime': 'Fri, 06 Sep 2013 17:19:06 GMT',
                        'rel': 'last memento',
                        'url': 'http://ws-dl-05.cs.odu.edu/demo/index.php?title=Tyrion_Lannister&oldid=107222'},
    'original latest-version': {   'rel': 'original latest-version',
                                   'url': 'http://ws-dl-05.cs.odu.edu/demo/index.php/Tyrion_Lannister'},
    'timegate': {   'rel': 'timegate',
                    'url': 'http://ws-dl-05.cs.odu.edu/demo/index.php/Special:TimeGate/Tyrion_Lannister'},
    'timemap': {   'from': 'Mon, 23 Apr 2007 20:26:15 GMT',
                   'rel': 'timemap',
                   'type': 'application/link-format',
                   'until': 'Fri, 06 Sep 2013 17:19:06 GMT',
                   'url': 'http://ws-dl-05.cs.odu.edu/demo/index.php/Special:TimeMap/Tyrion_Lannister'}}

@shawnmjones, maybe there are other websites to test with for compliance?