Open 50fe2c9d-2e9c-4082-805f-214289ced5dd opened 10 years ago
Currently urlparse.parse_qs (http://hg.python.org/cpython/file/2.7/Lib/urlparse.py#l150) assumes and uses ';' as a query string separator with no way to overwrite that. There are several web service APIs out there that use ';' as list separator (e.g. [URL]?fruits=lemon;lime&family=citrus). Although ';' seems like a sensible choice for a default, there should be a way to overwrite it.
As an enhancement, this could only go into 3.5.
So, are you suggesting I should change to a different type if desired for 2.7.x or leave for release to 3.5 and then submit a patch to backport it to 2.7.x? I apologize, not sure how the workflow works in these cases. Thanks.
I'm saying that this is a change that can be made only in 3.5. if you want to submit a patch here for 2.7 for other people to use that's fine, but it won't get applied.
Ah, gotcha. I think I will leave as is then. Thanks for clarifying.
If you could point to RFC which states the list of characters which can be used as valid query string separators, we can include that list. (Of course in 3.5)
Senthil,
The RFC can be found here: http://tools.ietf.org/html/rfc3986#section-2.2
If this bug is to be moved forward, we should consider this:
The RFC 3986 defines that a query can have any of these characters: /?:@-._~!$&'()*+,;= ALPHA DIGIT %HH (encoded octet)
But does not define how the data should be interpreted, leaving that to the naming authority and the URI schema (although http/https doesn't specify it as well; see RFC 7230).
OTOH, parse_qs (both on 2.x and 3.x) is very specific that the query string is of type application/x-www-form-urlencoded; which defines that the name is separated from the value by '=' and name/value pairs are separated from each other by '&', although the use of ';' to separate the pairs is only suggested to be supported by HTTP server implementors.
It could be that adding support to the characters specified by RFC 3986 pose as a challenge since there is no fixed schema and they can be freely used by the naming authority so perhaps we could add a parameter to enable/disable ';' as a pair separator?
Luiz,
The original question was about introducing a parameter to override query string separate ';'.
If we do with enable or disable, then we should provide another option for query string separator.
The OP provided one example of query string which had & as a separator along with ';'. I wonder how the parsing of that should be.
The pointer to the RFC makes me think that is alright to provide an option to 'override' the default separator instead of providing an enable/disable.
I would like to hear opposite thoughts on this.
Based on the example provided by the OP, it appears that he would expect the output to be: {'family': ['citrus'], 'fruits': ['lemon;lime']}
Since the W3C recommendation for the application/x-www-form-urlencoded type specify using '&' to separate the parameters in the query string (';' is not mentioned there), I recommended a parameter for disabling the use of ';' as a separator (but '&' will still be the separator to be used).
The only thing I see against using the RFC is that although it specifies which characters are valid in a query string, it does not define how they should be used; that is done by W3C's application/x-www-form-urlencoded and it is very specific about using '&' as a separator.
Hi all,
OP here. My intent was to optionally pass a separator parameter, _not_ enable/disable toggle.
Hi all,
Please take the next case: The url - http://hostname.domain/mypage.asp?fields=id&query=%22((release%3D{id%3D1004});(sprint%3D{id%3D1040});(team%3D{id%3D1004});(severity%3D{id%3D%27list_node.severity.urgent%27});!phase%3D{id+IN+%27phase.defect.closed%27,%27phase.defect.duplicate%27,%27phase.defect.rejected%27})%22
The Query as string - fields=id&query=%22((release%3D{id%3D1004});(sprint%3D{id%3D1040});(team%3D{id%3D1004});(severity%3D{id%3D%27list_node.severity.urgent%27});!phase%3D{id+IN+%27phase.defect.closed%27,%27phase.defect.duplicate%27,%27phase.defect.rejected%27})%22
The expected pairs -
The actual output -
W3C allows both constructs, ampersand and semicolon. https://www.w3.org/TR/html401/appendix/notes.html#h-B.2.2
Especially servlet containers and servers running CGI programs often use semicolons as a separator.
I would say to parse either ampersands OR semicolons and keep a priority to ampersands.
For example the query strings:
?fields=id&query=%22((release%3D{id%3D1004});(sprint%3D{id%3D1040});(team%3D{id%3D1004});(severity%3D{id%3D%27list_node.severity.urgent%27});!phase%3D{id+IN+%27phase.defect.closed%27,%27phase.defect.duplicate%27,%27phase.defect.rejected%27})%22
?fruits=lemon;lime&family=citrus
should be parsed with & separators only.
The modified example without & character: ?fruits=lemon;family=citrus
can be parsed with semicolon as a separator because it contains both '=' and ';' but no '&' characters.
We are on the same page and we should also consider marked this as defect.
Thanks
On Sun, Feb 17, 2019 at 7:44 PM nr \report@bugs.python.org\ wrote:
nr \aktiophi@googlemail.com\ added the comment:
W3C allows both constructs, ampersand and semicolon. https://www.w3.org/TR/html401/appendix/notes.html#h-B.2.2
Especially servlet containers and servers running CGI programs often use semicolons as a separator.
I would say to parse either ampersands OR semicolons and keep a priority to ampersands.
For example the query strings:
?fields=id&query=%22((release%3D{id%3D1004});(sprint%3D{id%3D1040});(team%3D{id%3D1004});(severity%3D{id%3D%27list_node.severity.urgent%27});!phase%3D{id+IN+%27phase.defect.closed%27,%27phase.defect.duplicate%27,%27phase.defect.rejected%27})%22
?fruits=lemon;lime&family=citrus
should be parsed with & separators only.
The modified example without & character: ?fruits=lemon;family=citrus
can be parsed with semicolon as a separator because it contains both '=' and ';' but no '&' characters.
---------- nosy: +nr
Python tracker \report@bugs.python.org\ \https://bugs.python.org/issue20116\
Greetings. I believe this is mooted by bpo-42967 as well as changes even prior to that.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['type-feature']
title = 'urlparse.parse_qs should take argument for query separator'
updated_at =
user = 'https://bugs.python.org/rubenorduz'
```
bugs.python.org fields:
```python
activity =
actor = 'jacobtylerwalls'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = []
creation =
creator = 'ruben.orduz'
dependencies = []
files = ['48146']
hgrepos = []
issue_num = 20116
keywords = []
message_count = 15.0
messages = ['207237', '207241', '207242', '207243', '207244', '207261', '207262', '263491', '263798', '263842', '263843', '335768', '335782', '335801', '397208']
nosy_count = 7.0
nosy_names = ['orsenthil', 'r.david.murray', 'ruben.orduz', 'luiz.poleto', 'kc', 'Kobi Gana', 'jacobtylerwalls']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue20116'
versions = ['Python 3.5']
```