Open abhiaagarwal opened 1 year ago
I did a bit of investigation by myself, and discovered a few things:
1) The standard follows PEP-249, the DBAPI standard, so it's technically correct to have type_code
returned rather than the type itself.
2) While type_code should be an integer, it is actually a string. For example, a BOOLEAN trino value returns boolean
. I'll attach a table below of the various mappings I've found.
I think the best solution is to change type_code
to actually an integer, and then introduce a mapping function/table from these integers to their string representation, both trino and python-based types. This can also help reduce the boilerplate in functions like RowMapperFactory
. The constructs relevant to PEP249 type_code
matching are already available.
I'm willing to open a PR for this, but it might be a somewhat wide-reaching change.
ASIDE: some types, and their type_code representation |
Trino Type | type_code |
---|---|---|
BOOLEAN | boolean | |
TINYINT | tinyint | |
SMALLINT | smallint | |
INTEGER | integer | |
BIGINT | bigint | |
REAL | real | |
DOUBLE | double | |
DECIMAL | decimal(10, 7) | |
VARCHAR | varchar | |
CHAR | char(1) | |
VARBINARY | varbinary | |
JSON | json | |
DATE | date | |
TIMESTAMP | timestamp(0) | |
TIMESTAMP_TIMEZONE | timestamp(3) with time zone | |
INTERVAL_MONTH | INTERVAL YEAR TO MONTH | |
INTERVAL_DAY | INTERVAL DAY TO SECOND | |
ARRAY | array(integer) | |
IPADDRESS | ipaddress | |
UUID | uuid |
Nice find @abhiagarwal01, this makes sense to fix. It'll be a breaking change but it'll make us follow the PEP correctly and allow third party tools to work better which make use of this information.
We'd appreciate if you want to submit a PR for this. Just beware that it probably won't be in the upcoming release since we like to include breaking changes in separate releases.
@hashhar Thank you! I'll likely be refactoring any type-specific logic to a separate types.py
file (and this can be used in the sqlalchemy module as well).
Is it possible to do this in two steps - separate PRs or different commits in same PR? Fix the type codes being used and any REQUIRED changes. Refactor sqlalchemy and dbapi.py to reduce boilerplate type handling?
I think the idea of a type_code
requires some level of refactoring, given that this library currently has no logical way to group multiple different data types. What I'll do is first a minimalistic PR that just fixes the type_code
by defining constants in constants.py
, and then add a separate PR that does any major refactoring. Does that sound good?
That sounds ideal. Thanks.
Describe the feature
I'm working with adding some plumbing between our backend system and Trino, and we're trying to derive more type information to speed up Python. Trino gives access to column information with the
cursor.description
method, but it gives atype_code
integer. I've been struggling to figure out whattype_code
maps to, and my google-fu has failed me. I'd appreciate any sort of guidance.Describe alternatives you've considered
I've been running queries to try and manually figure out
type_code
, but it hasn't been exhaustive.Are you willing to submit PR?