scrapy / w3lib

Python library of web-related functions
BSD 3-Clause "New" or "Revised" License
390 stars 104 forks source link

Space at end of query string is trimmed #223

Open Laerte opened 6 months ago

Laerte commented 6 months ago

Faced this issue while writing a spider, basically if we don't percent-encoding the space or only have one parameter the space is trimmed (if we have more parameters but the one with space is at end its also trimmed).

Find below the snippet:

from w3lib.url import safe_url_string
from urllib.parse import parse_qsl, urlparse

url = safe_url_string("https://httpbin.org/anything?keyword=A ")
assert (dict(parse_qsl(urlparse(url).query))["keyword"] == "A ") is False

url = safe_url_string("https://httpbin.org/anything?keyword=A%20")
assert (dict(parse_qsl(urlparse(url).query))["keyword"] == "A ") is True

url = safe_url_string("https://httpbin.org/anything?keyword=A &dummy=value")
assert (dict(parse_qsl(urlparse(url).query))["keyword"] == "A ") is True

url = safe_url_string("https://httpbin.org/anything?keyword=A%20&dummy=value")
assert (dict(parse_qsl(urlparse(url).query))["keyword"] == "A ") is True

url = safe_url_string("https://httpbin.org/anything?dummy=value&keyword=A ")
assert (dict(parse_qsl(urlparse(url).query))["keyword"] == "A ") is False

url = safe_url_string("https://httpbin.org/anything?dummy=value&keyword=A%20")
assert (dict(parse_qsl(urlparse(url).query))["keyword"] == "A ") is True
Gallaecio commented 6 months ago

I think the current behavior is OK for scenarios where you want a behavior consistent with that of a web browser, i.e. with what would happen if you pasted that URL in the address bar of a web browser.

Maybe we should have different functions for the different behaviors. Not sure how to call them, though.