scrapy / w3lib

Python library of web-related functions
BSD 3-Clause "New" or "Revised" License
392 stars 104 forks source link

safe_url_string URL-encodes already-encoded username and password, breaking idempodency #187

Closed andersk closed 2 years ago

andersk commented 2 years ago

The documentation claims that calling safe_url_string on an already “safe” URL will return the URL unmodified, but this breaks when the username or password include %.

>>> url = 'http://%25user:%25pass@host'
>>> url = w3lib.url.safe_url_string(url); url
'http://%2525user:%2525pass@host'
>>> url = w3lib.url.safe_url_string(url); url
'http://%252525user:%252525pass@host'
>>> url = w3lib.url.safe_url_string(url); url
'http://%25252525user:%25252525pass@host'
>>> url = w3lib.url.safe_url_string(url); url
'http://%2525252525user:%2525252525pass@host'
>>> url = w3lib.url.safe_url_string(url); url
'http://%252525252525user:%252525252525pass@host'
>>> url = w3lib.url.safe_url_string(url); url
'http://%25252525252525user:%25252525252525pass@host'
wRAR commented 2 years ago

Bisecting points at #174