python / cpython

The Python programming language
https://www.python.org
Other
63.02k stars 30.17k forks source link

urlencode of a None value uses the string 'None' #63057

Closed e7785e6a-ff54-4a02-bc6b-6c3d5357ec3b closed 9 years ago

e7785e6a-ff54-4a02-bc6b-6c3d5357ec3b commented 11 years ago
BPO 18857
Nosy @orsenthil, @taleinat, @ezio-melotti, @bitdancer, @PCManticore, @jayaddison
PRs
  • python/cpython#19945
  • python/cpython#19949
  • Files
  • urllib.patch
  • urllib_null_value.patch
  • test.html
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['extension-modules', 'type-feature'] title = "urlencode of a None value uses the string 'None'" updated_at = user = 'https://bugs.python.org/JoshuaJohnston' ``` bugs.python.org fields: ```python activity = actor = 'jayaddison' assignee = 'none' closed = True closed_date = closer = 'r.david.murray' components = ['Extension Modules'] creation = creator = 'Joshua.Johnston' dependencies = [] files = ['31782', '31785', '31798'] hgrepos = [] issue_num = 18857 keywords = ['patch'] message_count = 31.0 messages = ['196314', '196318', '196319', '197831', '197835', '197839', '197851', '197900', '197904', '198560', '198633', '198634', '198701', '198705', '198706', '210488', '210492', '210498', '210499', '210531', '210537', '210542', '245811', '245827', '368193', '368301', '368727', '378826', '378838', '378839', '378841'] nosy_count = 8.0 nosy_names = ['orsenthil', 'taleinat', 'ezio.melotti', 'r.david.murray', 'Claudiu.Popa', 'piotr.dobrogost', 'Joshua.Johnston', 'jayaddison'] pr_nums = ['19945', '19949'] priority = 'normal' resolution = 'wont fix' stage = 'resolved' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue18857' versions = ['Python 3.5'] ```

    e7785e6a-ff54-4a02-bc6b-6c3d5357ec3b commented 11 years ago

    This is strange behavior. When you encode nulls in other languages you don't get the string 'null' you usually get an empy string. Shouldn't str(None) == ''?

    If not str(None) == 'None' and the string representation of a None value should not match a known string

    >>> from urllib import urlencode
    >>> urlencode({'josh': None})
    'josh=None'
    bitdancer commented 11 years ago

    In Python the str of a None value is indeed 'None', just as the str of a True value is 'True'. Unless the protocol to which you are encoding supports null values, you shouldn't be using None values in the input to the serialization. If you want an empty string in the output, use an empty string in the input.

    Now, that said, it seems to me that while it is not (apparently) RFC compliant, query strings do have a natural way to support null values: a name without a value. It would therefore be convenient if urlencode supported null values by turning something like {'josh': None, 'fred': 'abc'} into:

    josh&fred=abc

    That is what I would expect it to do, so I'd be in favor of that enhancement.

    I note that parse_qs and parse_qsl don't handle this case either, which surprises me since I think such value-less query string parameters are fairly common.

    e7785e6a-ff54-4a02-bc6b-6c3d5357ec3b commented 11 years ago

    Hi David,

    That is what I would expect it to do as well. I'm relatively new to Python but this is causing all kinds of problems with oauth signing using ims_lti_py as well as my own code using urlencode.

    5df057ac-c83d-447e-8c45-910556b17608 commented 11 years ago

    Hello. I attached a simple patch for the urlencode case. I'll try to make one for parse_qs(l) as well.

    5df057ac-c83d-447e-8c45-910556b17608 commented 11 years ago

    Added patch for parse_qsl as well.

    orsenthil commented 11 years ago

    The patch looks good, but I have doubt in the claim that, this new behavior is actually a right thing to do.

    RDM wrote:

    Now, that said, it seems to me that while it is not (apparently) RFC compliant, query strings do have a natural way to support null values: a name without a value. It would therefore be convenient if urlencode supported null values by turning something like {'josh': None, 'fred': 'abc'} into:

    josh&fred=abc

    It is correctly recognized that it is not RFC Compliant. A reference to this behavior exhibited by the software may be helpful. The application/x-www-form-urlencoded type always looks for key=value kind of query string only. And changes submitted by patch will break for folks who think "a=None" is actually what they expected when the sent {a:None}.

    I think, some present instances where this behavior is exhibited will be good to note.

    bitdancer commented 11 years ago

    Thank you for working on the patches, Claudiu, but...

    The backward compatibility concern is valid.

    Furthermore, I did a bunch of googling looking for examples. I did not turn up any examples of APIs that were documented to use parameters without '='...all the cases I looked at that mentioned "parameter without a value" specified the "xxx=" form.

    Given that, and thinking about it further, it appears to me that a parameter with no '=' should be treated the same as one that has an '=' but no value...that is, the same as using an empty string. It would be *logical* for it to be a "nul" value, but in fact the http RFCs have no concept of a "nul" value, so it is in fact out of place.

    Sorry to have sent people down the wrong path here.

    So I'm back to saying that one should not use None in an application to represent no value for a query parameter, but rather the empty string. The url parsing code should turn a parameter without an '=' into an empty string, I think, but again there are backward compatibility concerns there. There may be code that unintentionally depends on them being discarded.

    So, I *think* we could, and should, make them not ignored, but turned into empty strings, in 3.4, following the first part in Postel's law ("be generous in what you accept"). But following the second half of Postel's law ("be strict in what you generate") we really shouldn't generate parameters without even an '=', given that they are not RFC compliant.

    5df057ac-c83d-447e-8c45-910556b17608 commented 11 years ago

    No problem, David, working on these patches is just an occasion for me to learn more and be useful at the same time.

    e7785e6a-ff54-4a02-bc6b-6c3d5357ec3b commented 11 years ago

    I know that languages like php will treat ?josh= the same as ?josh

    Using the attached test form in Google Chrome, you will see the data passed as josh= when empty in both GET and POST requests. This is probably the way to go, key=str(value) if value is not None else ''

    orsenthil commented 11 years ago

    Hi Joshua, I did not setup a php server to test the html form that you uploaded. We cannot test this in browsers because HTML does not have None as values, it is simple empty strings and that is correct. I think that the cost of making this change is higher than the value that it will give and I am inclined to close this bug report as wont fix. If you strongly disagree, please feel free to reopen it (and can possibly discuss at python-dev). Thank you!

    e7785e6a-ff54-4a02-bc6b-6c3d5357ec3b commented 11 years ago

    Hi Senthil, You can open the html file with a browser and inspect the data posting to itself without a web server running. That is how I tested.

    e7785e6a-ff54-4a02-bc6b-6c3d5357ec3b commented 11 years ago

    I still believe that since None represents the absence of a value it should not be urlencoded as the string 'None'. I am not sure what they best way to url encode it is, but I know that 'None' is not it.

    bitdancer commented 11 years ago

    No, Senthil is correct.

    My original liking for this idea came from my mistaken impression that a value without an '=' was different from a value with an '='. But clearly the practice in the industry (the de facto standard) is that they are the same, and indicate that the value of the parameter is the empty string.

    So, urls do *not* have any way of representing a null value (as differentiated from an empty string). All url values are strings. This means that the correct value to use for a python value is always its string representation, which in the case of None is "None", just as the correct representation of a Python boolean value is "True" or "False" (since http also does not have any specifically boolean type, only string values).

    e7785e6a-ff54-4a02-bc6b-6c3d5357ec3b commented 11 years ago

    I agree with True == 'True' and False == 'False' but None should be empty since it represents the absence of a value akin to null, Nil, etc in other languages. Ultimately it is not my decision make so I can only agree to disagree.

    Thanks for having this discussion with me. So far the python community has been a pleasant change from what I am used to!

    bitdancer commented 11 years ago

    If we were making this decision de novo, we might decide it that way. However, the current behavior has been in place for a long time, so backward compatibility concerns raise the bar high enough that the costs of the change outweigh any benefits, even if we got general agreement that there was a benefit :)

    e7785e6a-ff54-4a02-bc6b-6c3d5357ec3b commented 10 years ago

    I'm sorry to reopen this but after it biting me quite a few times more I still cannot think of a valid use-case for this behavior that someone would be depending on 'None' being passed.

    I think your backwards compatibility concerns are artificial. Can anyone describe a use-case that depended on arg=None being passed in a query string?

    I am sure that anyone who is encountering this behavior is treating the string 'None' as None when encountered in a request query string.

    Consider this example usage. A website presents a user with a form to search their twitter followers using the twitter api https://api.twitter.com/1.1/friends/ids.json

    Form fields optional screen_name: [____] (assume more fields)

    Handler gets the form post and builds the dict for the search query string.

    # User entered nothing so params = {'screen_name': None, ..more fields}
    params = {k: self.request.get(k, None) for k in self.request.GET}
    
    url = "https://api.twitter.com/1.1/friends/ids.json?" + urllib.urlencode(params)

    print url "https://api.twitter.com/1.1/friends/ids.json?screen_name=None"

    This would cause the twitter search api to look for your friends with None in their screen name. Not exactly what you'd expect right?

    bitdancer commented 10 years ago

    What is returning a field dictionary containing None instead of an empty string?

    e7785e6a-ff54-4a02-bc6b-6c3d5357ec3b commented 10 years ago

    In this exact example it would be an empty string. It was a fake setup to illustrate a real problem.

    This is the important part:

    params = dict(screen_name=None,count=300)
    url = "https://api.twitter.com/1.1/friends/ids.json?" + urllib.urlencode(params)
    print url  # "https://api.twitter.com/1.1/friends/ids.json?screen_name=None&count=300"
    
    screen_name=None is not the behavior you would want.

    Another example is in webapp2's uri_for function which uses urlencode internally. ref: http://stackoverflow.com/questions/7081250/webapp2-jinja2-how-can-i-get-uri-for-working-in-jinja2-views

    If you try to use uri_for in your jinja2 template you must jump through hoops like:

    \<script> {% if screen_name %} var url = '{{ uri_for('something', screen_name=screen_name) }}'; {% else %} var url = '{{ uri_for('something') }}'; {% endif %} \</script>

    bitdancer commented 10 years ago

    None of those problems exist if you correctly use the empty string to indicate an empty string value instead of trying to use None.

    e7785e6a-ff54-4a02-bc6b-6c3d5357ec3b commented 10 years ago

    If this was a function to encode a dict into something then I would see your point and agree. urlencode is specifically designed to work within the domain or URIs. In this domain, it is acceptable to have an empty value for a key in a query string. None is a representation of nothing, empty, null, the absence of a value. Therefore you would expect a function in the domain of URIs to construct a valid URI component when you specifically tell it to use None. Valid is up to you, either ignore the key-value pair completely, or use key[=&] to represent the empty value.

    Take Requests as an example that gets it right:

        >>> import requests
        >>> requests.get('http://www.google.com/', params={'key': None}).url
        u'http://www.google.com/'
        >>> requests.get('http://www.google.com/', params={'key': ''}).url
        u'http://www.google.com/?key='
    bitdancer commented 10 years ago

    No, the domain of URIs does not have *any* concept of a null value. It only has the concept of a string being empty or not empty (or the key not existing at all...ie: it doesn't exist in your params dict).

    You are trying to map a Python concept (the singleton object None, which is used for a variety of purposes) into a domain that does not have an equivalent concept.

    That said, following the requests model of omitting the key if the value is None is something we could consider as a convenience enhancement. But we would again run into Python's strict backward compatibility policy. I could imagine some programmer building an internal web service that turns the string 'None' back into a Python None value. The fact that it would have to be an internal thing would mean we'd never hear about it...until we broke it :)

    Whether or not adding this feature would require a new keyword argument to urlencode is a judgment call. It might be an acceptable change in a feature release.

    e7785e6a-ff54-4a02-bc6b-6c3d5357ec3b commented 10 years ago

    While the RFC makes no mention of empty values either way, it has become standard practice to either omit the key-value completely or pass a key (optional = sign) by itself in these situations so I would consider that as standard behavior.

    While I stand by my position that the function is broken in regards to None, I do not have the clout to make this change without support from the community so I will leave it at that.

    >>Whether or not adding this feature would require a new keyword argument to urlencode is a judgment call. >>It might be an acceptable change in a feature release. I do believe that it should be changed in a future release and a new keyword argument would suffice except for when the abstraction level is high enough to shield you from being able to specify this new argument.

    >I could imagine some programmer building an internal web service that >turns the string 'None' back into a Python None value. The fact that it >would have to be an internal thing would mean we'd never hear about >it...until we broke it :) I was talking to someone about the same thing during lunch today. This whole bug report came about because I was finding 'None' string values in our app engine datastore because of optional parameters with None values being encoded as 'None'.

    c363d87d-6318-48d8-932c-605c563ff049 commented 9 years ago

    This problem came out in the bug "Cannot make URL query string with a parameter without a value (https://github.com/kennethreitz/requests/issues/2651) raised in urllib3 project (which itself is being used by Requests library mentioned in this bug).

    bitdancer commented 9 years ago

    I see no reason for this issue to be open. As suggested on the linked bug, the value should be passed as an empty string (which will produce, eg "&foo=&a='1'", which is the "correct" format for such a parameter. Unless someone can point to a "real" web server that does something different with "&foo" than with "&foo=", there is no reason to make a change to Python.

    112d306f-9a66-4670-b417-56ed76ffaca9 commented 4 years ago

    Chiming in here to add that I'd appreciate the ability to render 'standalone' (i.e. no '=') query-string keys in order to distinguish between absence-of-value and empty-string situations.

    The backwards-compatibility concerns in here are genuine, so perhaps this could be introduced as an argument to urlencode with a disabled default value, allowing developers to opt-in.

    > Unless someone can point to a "real" web server that does something different with "&foo" than with "&foo=", there is no reason to make a change to Python.

    There's a popular nodejs library that makes this serialization distinction explicit: https://github.com/sindresorhus/query-string#falsy-values

    I've developed a Python 3.7-based set of commits[1] to address this issue. I haven't yet opened this as a pull request since I see that Python 3.7 is in maintenance/bugfix mode[2].

    In case a new urlencode flag would fall under the category of feature, I'll aim to develop a subsequent set of commits against the master development branch soon.

    [1] - https://github.com/jayaddison/cpython/compare/3.7..9555467

    [2] - https://devguide.python.org/#status-of-python-branches

    112d306f-9a66-4670-b417-56ed76ffaca9 commented 4 years ago

    The pair of pull requests below implement None-preserving urlencode and parse_qs* via a default-disabled flag 'standalone_keys'.

    (they're also already linked with this issue, thanks to the neat GitHub/BPO integration)

    A benefit of the proposed serialization changes is that developers can opt-in to a scheme in which "{'a': None}" and "{'a': ''}" do not collide to the same encoded representation.

    Would it be possible to re-open this issue for discussion?

    112d306f-9a66-4670-b417-56ed76ffaca9 commented 4 years ago

    NB: There appears to be some relevant discussion in https://github.com/whatwg/url/issues/469

    taleinat commented 3 years ago

    Thanks for the PRs, James!

    I've closed the PRs for now, to avoid having people spend time reviewing them while this issue is closed as "wontfix". That said, if further discussion changes that decision, the PRs could be reopened.

    112d306f-9a66-4670-b417-56ed76ffaca9 commented 3 years ago

    No problem, and thanks for the heads-up Tal! I'll raise this as a topic on python-dev if it still seems worth pursuing, after collecting some more thoughts about it.

    orsenthil commented 3 years ago

    Hi James, I will give another look at it tonight or latest by Sunday PST , since I was involved in this and PRs. If we can make a decision within this context, great, otherwise we can open it up to python-dev

    On Sat, Oct 17, 2020 at 3:30 PM James Addison \report@bugs.python.org\ wrote:

    James Addison \jay+bpo@jp-hosting.net\ added the comment:

    No problem, and thanks for the heads-up Tal! I'll raise this as a topic on python-dev if it still seems worth pursuing, after collecting some more thoughts about it.

    ----------


    Python tracker \report@bugs.python.org\ \https://bugs.python.org/issue18857\


    112d306f-9a66-4670-b417-56ed76ffaca9 commented 3 years ago

    Thanks Senthil; please take your time. This isn't urgent, and would likely benefit from further standardization of the URL query string and/or form-encoded data formats (outside the Python project) to achieve consensus.

    A fully-considered answer at a later date would probably sit more comfortably with me than one that has any sense of time pressure.