python / cpython

The Python programming language
https://www.python.org
Other
62.61k stars 30.04k forks source link

urlencode() of dictionary not as expected #68648

Open 9b18034d-5465-40d9-84b4-e3dd75a7f990 opened 9 years ago

9b18034d-5465-40d9-84b4-e3dd75a7f990 commented 9 years ago
BPO 24460
Nosy @bitdancer, @gareth-rees, @vadmium

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-bug', 'library', 'docs'] title = 'urlencode() of dictionary not as expected' updated_at = user = 'https://bugs.python.org/drueterassystcom' ``` bugs.python.org fields: ```python activity = actor = 'martin.panter' assignee = 'docs@python' closed = False closed_date = None closer = None components = ['Documentation', 'Library (Lib)'] creation = creator = 'drueter@assyst.com' dependencies = [] files = [] hgrepos = [] issue_num = 24460 keywords = [] message_count = 5.0 messages = ['245431', '245432', '245435', '245437', '245537'] nosy_count = 5.0 nosy_names = ['r.david.murray', 'docs@python', 'gdr@garethrees.org', 'martin.panter', 'drueter@assyst.com'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue24460' versions = ['Python 3.4', 'Python 3.5', 'Python 3.6'] ```

9b18034d-5465-40d9-84b4-e3dd75a7f990 commented 9 years ago

In Python 3.4 I would like to serialize a dictionary into a URL-encoded string.

Given a dictionary like this:

>> thisDict = {'SomeVar1': [b'abc'], 'SomeVar2': [b'def'], 'SomeVar3': [b'ghi']}

I would like to be able to return this string:

    SomeVar1=abc&SomeVar2=def&SomeVar3=ghi

I thought that urllib.parse.urlencode would work for me, but it does not:

>>> print(urllib.parse.urlencode(thisDict))
    SomeVar1=%5Bb%27abc%27%5D&SomeVar2=%5Bb%27def%27%5D&SomeVar3=%5Bb%27ghi%27%5D

In other words, urlencode on the dictionary is performing a URL encode on the string that is returned when the dictionary is cast to a string...and is including the square brackets (escaped) and the byte literal "b" indicator.

{'SomeVar1': [b'abc'], 'SomeVar2': [b'def'], 'SomeVar3': [b'ghi']}

I can obtain the desired string with this:

>> '&'.join("{!s}={!s}".format(key,urllib.parse.quote_plus(str(val[0],'utf-8'))) for (key,val) in thisDict.items())

Is the behavior of urllib.parse.urlencode() on a dictionary intentional? When would the current behavior ever be useful?

Would it make sense to change the behavior of urllib.parse.urlencode such that it works as described above?

faa20c90-fcf0-43f4-810b-00286077d549 commented 9 years ago

If you read the documentation for urllib.parse.urlencode 1, you'll see that it says:

The value element in itself can be a sequence and in that case, if
the optional parameter doseq is evaluates to True, individual
key=value pairs separated by '&' are generated for each element of
the value sequence for the key.

So you need to write:

    >>> urllib.parse.urlencode(thisDict, doseq=True)
    'SomeVar3=ghi&SomeVar1=abc&SomeVar2=def'
bitdancer commented 9 years ago

That behavior is complex enough that I think it would be worth adding an example of it to the examples section (and maybe linking directly from the doseq explanation to that specific example).

9b18034d-5465-40d9-84b4-e3dd75a7f990 commented 9 years ago

Ah hah! Indeed, urlencode() does work on dictionaries as expected when doseq=True. Thank you for clarifying.

FWIW I had read the documentation and the referenced examples multiple times. I would like to make a few documentation suggestions for clarity.

1 ) Update https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlencode

Where documentation currently says: "When a sequence of two-element tuples is used as the query argument, the first element of each tuple is a key and the second is a value. The value element in itself can be a sequence and in that case, if the optional parameter doseq is evaluates to True, individual key=value pairs separated by '&' are generated for each element of the value sequence for the key. The order of parameters in the encoded string will match the order of parameter tuples in the sequence."

Perhaps instead the following would be more clear: "The query argument may be a sequence of two-element tuples where the first element of each tuple is a key and the second is a value. However the optional parameter doseq must then be set to True in order to reliably generate individual key=value pairs separated by '&' for each element of the value sequence for the key, and to preserve the sequence of the elements in the query parameter."

2) Update https://docs.python.org/3/library/urllib.request.html#urllib-examples

The examples are referenced from the documentation: "Refer to urllib examples to find out how urlencode method can be used for generating query string for a URL or data for POST." However the example page provides contradictory information and examples for this specific use case.

Currently the examples page says: "The urllib.parse.urlencode() function takes a mapping or sequence of 2-tuples and returns a string in this format. It should be encoded to bytes before being used as the data parameter. The charset parameter in Content-Type header may be used to specify the encoding. If charset parameter is not sent with the Content-Type header, the server following the HTTP 1.1 recommendation may assume that the data is encoded in ISO-8859-1 encoding. It is advisable to use charset parameter with encoding used in Content-Type header with the Request."

Perhaps instead the following would be more clear: "The urllib.parse.urlencode() query parameter can accept a mapping or sequence of 2-tuples and return a string in this format if the optional parameter doseq is set to True. It should be encoded to bytes before being used as the query parameter. The charset parameter in Content-Type header may be used to specify the encoding. If charset parameter is not sent with the Content-Type header, the server following the HTTP 1.1 recommendation may assume that the data is encoded in ISO-8859-1 encoding. It is advisable to use charset parameter with encoding used in Content-Type header with the Request."

3) Also on the example page, there are examples of urlencode operating on dictionaries where doseq is not provided. This is confusing. It would be better to show doseq = True:

Here is an example session that uses the GET method to retrieve a URL containing parameters:
...
>>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
...
The following example uses the POST method instead. Note that params output from urlencode is encoded to bytes before it is sent to urlopen as data:
...
>>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})

I suggest that these examples read:
>>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0}, doseq=true)
vadmium commented 9 years ago
  1. urlencode(): I agree the documentation is unclear. But David Rueter’s suggestion does not help much. I think doseq=True is meant to also work for a mapping query (as in original post), and is not required in the sequence-of-tuples mode if each tuple has a single parameter value. Perhaps something like this could work instead:

“When a sequence of two-element tuples is used as *query, the first element of each tuple is a key and the second specifies one or more values. If *doseq is true, each *query (mapping or sequence) item can specify a sequence of values; if *doseq is false (the default), each item specifies a single value. The order of parameters in the encoded string will match the order of items in *query* and the order of values in an item.”

  1. urlopen(data=...) and Request(data=...): I don’t see the contradiction. It looks like David Rueter’s suggestion only changes the first sentence, to say doseq=True is required to get the urlencoded format, but this is not required. See also bpo-23360 about my own problems with this bit of the documentation.

  2. Examples: Again, I do not see why doseq=True should be shown when it is simpler without. But an example of when it is useful would be good, as R David Murray suggested.