Closed mdesmet closed 2 years ago
Isn't this basically adding support for implicit casting on the client side? What's the motivation?
While implicit casts are convenient they change behaviour of the system and in user-visible ways so any change has to considered carefully since we won't get multiple passes at defining the behaviour.
cc: @findepi since CASTs and coercions.
My mental model for thinking about this -- based on the JDBC experience -- is that every client type (Java type, Python type) maps to some well defined type in SQL type system. For example a Java String
-> varchar
, Python str
-> varchar
. The client should be able to bind values of SQL types not directly expressible in the client type system. For example in JDBC this is achieved with PreparedStatement.setObject(int index, Object value, int sqlTypeCode)
.
So far, the mental model served me well
col = ?
seems trivial, sometimes parameter show up in a slightly altered context (eg lower()
, cast
, or St_Distance(col, ?)
). Then type derivatives are no longer "trivial", actually not feasible anymore.Now, this all is quite a generic comment about what clients should be doing, based primarily on my JDBC experience. There might be something special about SQLAlchemy. Maybe it lacks some interfaces for typeful binds for SQL types? Or makes it hard for the calling application to provide data in the right types?
Or makes it hard for the calling application to provide data in the right types?
This is exactly the case we are facing in the Pandas to_sql
function. Changing behaviour of implicit conversion shouldn't be done by default. I would opt for adding a flag which enables implicit conversion in SQLAlchemy dialect. Some functionalities of Trino Great Expectations integration are impacted by it.
They could simply fix it like this (with explicit conversion in pandas).
>>> import pandas as pd
>>> data = {'col1': [2.0, 1, 'test']}
>>> df = pd.DataFrame(data)
>>> df
col1
0 2.0
1 1
2 test
>>> pd.api.types.infer_dtype(df['col1'])
'mixed-integer'
>>> df['col1'] = df['col1'].astype(str)
>>> df
col1
0 2.0
1 1
2 test
>>> pd.api.types.infer_dtype(df['col1'])
'string'
However I tested the same with postgres. Postgres inserts and updates automatically convert from numeric to string without explicit cast. Note that this implicit cast does not work in the WHERE clause.
CREATE TABLE T (a VARCHAR);
INSERT INTO T (a) VALUES (1);
UPDATE T SET a=2 WHERE a='1';
select * from T;
@mdesmet the described behavior difference between Trino and PostgreSQL is engines' difference
Trino
trino:default> CREATE TABLE t(c varchar(20));
-> INSERT INTO t VALUES (42);
CREATE TABLE
Query 20220809_142246_00011_sftar failed: Insert query has mismatched column types: Table: [varchar(20)], Query: [integer]
PostgreSQL
test=# CREATE TABLE t(c varchar(20));
CREATE TABLE
test=# INSERT INTO t VALUES (42);
INSERT 0 1
So in this case postgres is actually diverting from the SQL spec as a number is not assignable to a string.
I will close this PR.
This PR will autoconvert numeric values to String when inserting or updating records through sqlalchemy targeting a String column.
SqlAlchemy allows to do inserts using following syntax:
The python types may not match the target type of the table. Some databases will autocast these values, while Trino doesn't as in following code:
This will generate following exception in Trino