[Question]: Are URL query strings containing keys without values invalid on purpose?

python-validators / validators

Python Data Validation for Humans™.

MIT License

958 stars 152 forks source link

[Question]: Are URL query strings containing keys without values invalid on purpose? #363

Closed MaxDall closed 4 months ago

MaxDall commented 5 months ago

Hey validators team :)

Thanks for the great work you do in maintaining this package 👍.

I've noticed a significant change regarding URL validation in the recent updates, particularly with 5f6437c, where the parsing of query strings underwent a complete overhaul. Specifically, the package now utilizes urllib's query parsing method, resulting in URLs containing query strings with keys but without values (e.g., https://www.lrt.lt/?rss) being flagged as invalid. Given that URLs of this nature are commonly encountered, and according to RFC-3986, the presence of values for keys is not required, I'm curious to know if this change was intentional or merely a side effect.

MaxDall commented 5 months ago

I'm sorry, I just realized that this is a copy of #346 thus I want to expand my question:

As #346 suggested to pass strict_query=False, basically disabling query parsing, are there plans to expand the existing implementation to avoid this and validate queries correctly again?

yozachar commented 5 months ago

As https://github.com/python-validators/validators/issues/346 suggested to pass strict_query=False, basically disabling query parsing, ...

How so?

MaxDall commented 5 months ago

The used urllib.parse.parse_qs function utilizes a boolean parameter called keep_blank_values, as outlined in the documentation, to determine whether to retain empty query keys. One could propagate said parameter to users, or enable it by default.

Following my initial example (https://www.lrt.lt/?rss)

urllib.parse.parse_qs("rss", keep_blank_values=True)

#output
{'rss': ['']}

If this suits you, I could open a PR.

Update: Additionally, I would suggest replacing strict_query with keep_blank_values, as the former suppresses part of the validation, which feels kinda contradictory to the library's purpose, but that's up to you :)

Update II: Just realized that setting keep_blank_values=True still treats it as a parsing error thus necessitating strict_parsing=False to work. I'm gonna reach out to urllib to see if I can do something there.

There is already an issue open in cpython referencing this problem.

yozachar commented 5 months ago

Just realized that setting keep_blank_values=True still treats it as a parsing error thus necessitating strict_parsing=False to work. I'm gonna reach out to urllib to see if I can do something there.

URL, already has 9 keyword and 1 positional parameter, I'm afraid adding more would clutter the function definition.

yozachar commented 4 months ago

Closing this for now. Will reopen, if https://github.com/python/cpython/issues/110843 is resolved.