Open 9b18034d-5465-40d9-84b4-e3dd75a7f990 opened 9 years ago
In Python 3.4 I would like to serialize a dictionary into a URL-encoded string.
Given a dictionary like this:
>> thisDict = {'SomeVar1': [b'abc'], 'SomeVar2': [b'def'], 'SomeVar3': [b'ghi']}
I would like to be able to return this string:
SomeVar1=abc&SomeVar2=def&SomeVar3=ghi
I thought that urllib.parse.urlencode would work for me, but it does not:
>>> print(urllib.parse.urlencode(thisDict))
SomeVar1=%5Bb%27abc%27%5D&SomeVar2=%5Bb%27def%27%5D&SomeVar3=%5Bb%27ghi%27%5D
In other words, urlencode on the dictionary is performing a URL encode on the string that is returned when the dictionary is cast to a string...and is including the square brackets (escaped) and the byte literal "b" indicator.
{'SomeVar1': [b'abc'], 'SomeVar2': [b'def'], 'SomeVar3': [b'ghi']}
I can obtain the desired string with this:
>> '&'.join("{!s}={!s}".format(key,urllib.parse.quote_plus(str(val[0],'utf-8'))) for (key,val) in thisDict.items())
Is the behavior of urllib.parse.urlencode() on a dictionary intentional? When would the current behavior ever be useful?
Would it make sense to change the behavior of urllib.parse.urlencode such that it works as described above?
If you read the documentation for urllib.parse.urlencode 1, you'll see that it says:
The value element in itself can be a sequence and in that case, if
the optional parameter doseq is evaluates to True, individual
key=value pairs separated by '&' are generated for each element of
the value sequence for the key.
So you need to write:
>>> urllib.parse.urlencode(thisDict, doseq=True)
'SomeVar3=ghi&SomeVar1=abc&SomeVar2=def'
That behavior is complex enough that I think it would be worth adding an example of it to the examples section (and maybe linking directly from the doseq explanation to that specific example).
Ah hah! Indeed, urlencode() does work on dictionaries as expected when doseq=True. Thank you for clarifying.
FWIW I had read the documentation and the referenced examples multiple times. I would like to make a few documentation suggestions for clarity.
1 ) Update https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlencode
Where documentation currently says: "When a sequence of two-element tuples is used as the query argument, the first element of each tuple is a key and the second is a value. The value element in itself can be a sequence and in that case, if the optional parameter doseq is evaluates to True, individual key=value pairs separated by '&' are generated for each element of the value sequence for the key. The order of parameters in the encoded string will match the order of parameter tuples in the sequence."
Perhaps instead the following would be more clear: "The query argument may be a sequence of two-element tuples where the first element of each tuple is a key and the second is a value. However the optional parameter doseq must then be set to True in order to reliably generate individual key=value pairs separated by '&' for each element of the value sequence for the key, and to preserve the sequence of the elements in the query parameter."
2) Update https://docs.python.org/3/library/urllib.request.html#urllib-examples
The examples are referenced from the documentation: "Refer to urllib examples to find out how urlencode method can be used for generating query string for a URL or data for POST." However the example page provides contradictory information and examples for this specific use case.
Currently the examples page says: "The urllib.parse.urlencode() function takes a mapping or sequence of 2-tuples and returns a string in this format. It should be encoded to bytes before being used as the data parameter. The charset parameter in Content-Type header may be used to specify the encoding. If charset parameter is not sent with the Content-Type header, the server following the HTTP 1.1 recommendation may assume that the data is encoded in ISO-8859-1 encoding. It is advisable to use charset parameter with encoding used in Content-Type header with the Request."
Perhaps instead the following would be more clear: "The urllib.parse.urlencode() query parameter can accept a mapping or sequence of 2-tuples and return a string in this format if the optional parameter doseq is set to True. It should be encoded to bytes before being used as the query parameter. The charset parameter in Content-Type header may be used to specify the encoding. If charset parameter is not sent with the Content-Type header, the server following the HTTP 1.1 recommendation may assume that the data is encoded in ISO-8859-1 encoding. It is advisable to use charset parameter with encoding used in Content-Type header with the Request."
3) Also on the example page, there are examples of urlencode operating on dictionaries where doseq is not provided. This is confusing. It would be better to show doseq = True:
Here is an example session that uses the GET method to retrieve a URL containing parameters:
...
>>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
...
The following example uses the POST method instead. Note that params output from urlencode is encoded to bytes before it is sent to urlopen as data:
...
>>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
I suggest that these examples read:
>>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0}, doseq=true)
“When a sequence of two-element tuples is used as *query, the first element of each tuple is a key and the second specifies one or more values. If *doseq is true, each *query (mapping or sequence) item can specify a sequence of values; if *doseq is false (the default), each item specifies a single value. The order of parameters in the encoded string will match the order of items in *query* and the order of values in an item.”
urlopen(data=...) and Request(data=...): I don’t see the contradiction. It looks like David Rueter’s suggestion only changes the first sentence, to say doseq=True is required to get the urlencoded format, but this is not required. See also bpo-23360 about my own problems with this bit of the documentation.
Examples: Again, I do not see why doseq=True should be shown when it is simpler without. But an example of when it is useful would be good, as R David Murray suggested.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['type-bug', 'library', 'docs']
title = 'urlencode() of dictionary not as expected'
updated_at =
user = 'https://bugs.python.org/drueterassystcom'
```
bugs.python.org fields:
```python
activity =
actor = 'martin.panter'
assignee = 'docs@python'
closed = False
closed_date = None
closer = None
components = ['Documentation', 'Library (Lib)']
creation =
creator = 'drueter@assyst.com'
dependencies = []
files = []
hgrepos = []
issue_num = 24460
keywords = []
message_count = 5.0
messages = ['245431', '245432', '245435', '245437', '245537']
nosy_count = 5.0
nosy_names = ['r.david.murray', 'docs@python', 'gdr@garethrees.org', 'martin.panter', 'drueter@assyst.com']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue24460'
versions = ['Python 3.4', 'Python 3.5', 'Python 3.6']
```