Closed jimfulton closed 3 years ago
This makes sense to me. (Both continue to accept strings, and also support bytes)
I'm having trouble finding in the REST docs how we expect BYTES data type to be encoded, but presumably we have other support for this outside of the DB-API which we can reference.
At a higher level:
(unit-3-9) jim@ds9:~/p/g/python-bigquery$ python
Python 3.9.4 (default, Apr 8 2021, 08:01:43)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import google.cloud.bigquery.dbapi
/home/jim/p/g/python-bigquery/.nox/unit-3-9/lib/python3.9/site-packages/pandas/compat/__init__.py:97: UserWarning: Could not import the lzma module. Your installed Python is incomplete. Attempting to use lzma compression will result in a RuntimeError.
warnings.warn(msg)
>>> conn = google.cloud.bigquery.dbapi.connect()
>>> cursor = conn.cursor()
>>> cursor.execute("select %(:BYTES)s", [b'xxx'])
>>> list(cursor)
[Row((b'xxx',), {'f0_': 0})]
>>> cursor.execute("select %(:BYTES)s", ['yyy'])
>>> list(cursor)
[Row((b'\xcb,',), {'f0_': 0})]
Note that bytes input is handled correctly, but string input isn't.
The Binary
function doesn't appear to be used by the Python client itself. When running unit tests, it's only called by a test that tests that it encodes strings.
I'm having trouble finding in the REST docs how we expect BYTES data type to be encoded, but presumably we have other support for this outside of the DB-API which we can reference.
From: https://cloud.google.com/bigquery/docs/reference/rest/v2/StandardSqlDataType
"Encoded as a base64 string per RFC 4648, section 4."
It appears that we encode this properly only for bytes data: https://github.com/googleapis/python-bigquery/blob/a3224337dac217ec07df83bf0ad570b7aa6d2ec9/google/cloud/bigquery/_helpers.py#L295-L299
Bytes are the most common way to represent binary data. Accepting strings, as it does now seems at best to be a convenience and at worst a bug magnet.
In SQLAlchemy, if you defined a model that has a binary attribute, you'd store bytes data in it, but that would break for bigquery, di to this issue.
Sqlite's
Binary
function requires bytes data.I propose to change the function to accept bytes. For the sake of backward compatibility, I propose to continue to accept strings.