googleapis / python-ndb

Apache License 2.0
150 stars 66 forks source link

Key.urlsafe() in python 3 being unsafe #259

Closed neurino closed 4 years ago

neurino commented 4 years ago

Hello,

Converting a flask app form Python 2 to 3 I'm having very hard times copeing with Key.urlsafe() returning bytes instead of str.

Besides being this possibly formally correct, the value returned is not safe for sure or, in other terms, you're forced every time to decode bytes in the current encoding — utf-8 for me — to make it usable in a template, see example.

I started adding .decode('utf-8') and .encode('utf-8') here and there and is making my code uglier and uglier,

I wonder: if it is thought to be used in webpages and webpages are made of unicode strings, should Key.urlsafe be a unicode string?

Environment details

OS type and version:

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"

Python version and virtual environment information: python --version

$python --version
Python 3.7.5
  1. google-cloud- version: pip show google-<service> or pip freeze
$ pip show google-cloud-ndb
Name: google-cloud-ndb
Version: 0.2.0
Summary: NDB library for Google Cloud Datastore
Home-page: https://github.com/googleapis/python-ndb
...

Code example

class User(ndb.Model): 
     name = ndb.StringProperty(required=True) 

@app.route('/', methods=['GET', 'POST'])
@app.route('/<urlsafe>', methods=['GET', 'POST'])
def index(urlsafe=None):
    """Render the home page."""
    if urlsafe:
        user = ndb.Key(urlsafe=urlsafe).get()
    else:
        user = None
    return render_template('index.html', user=user)

<!-- index.html -->
<form>
    <input type="text" id="name" "value="{{ user.name }}" />
    <input type="hidden" id="key" value="{{ user.key.urlsafe() }}" />
    <input type="submit" value="save" />
</form>    

<!-- What is rendered -->
<form>
    <input type="text" id="name" "value="Jack" />
    <input type="hidden" id="key" value="b'agti...JEKDA'" />
    <input type="submit" value="save" />
</form>    
andrewsg commented 4 years ago

In Python 2, "str" referred to a bytestring and "unicode" referred to a unicode string. In Python 3, "str" refers to unicode and "bytes" refers to a bytestring. In both cases, key.urlsafe() returns a bytestring. As urlsafe() is base64, this behavior is identical to the standard library's base64 behavior, which also returns a bytestring in both versions. I agree this is inconvenient in practice; however, it's working as intended for Python 3 strings.

neurino commented 4 years ago

And the reason why with the very same code in python 2 I never had to encode and decode is... ?

andrewsg commented 4 years ago

My best guess is that the behavior of Flask or another library you are using changed from 2 to 3 -- in 2 it accepted bytestrings (str) and in 3 it only accepts unicode (str). To find out exactly where this change occurred, look at the part of your code that breaks if you don't decode/encode -- that will be where the change in behavior occurred.