laughingman7743 / PyAthena

PyAthena is a Python DB API 2.0 (PEP 249) client for Amazon Athena.
MIT License
457 stars 102 forks source link

Integer variant types incorrectly rendered in DDL #515

Closed logorrheic closed 3 months ago

logorrheic commented 4 months ago

Thank you for PyAthena!

The get_column_specification function uses isinstance to determine whether to render the context-specific INT keyword in DDL. Due to the SQLAlchemy type hierarchy this also captures integer variants like BigInteger, causing them to be rendered incorrectly.

laughingman7743 commented 4 months ago

Please describe the issue in detail, and I need the SQLAlchemy version as well. Code or stack trace that reproduces the issue is even better.

logorrheic commented 4 months ago

Certainly.

Code:

from sqlalchemy import Column, MetaData
from sqlalchemy.types import BigInteger, Integer, SmallInteger
from sqlalchemy.sql.schema import Table
from sqlalchemy import create_mock_engine
from sqlalchemy.schema import CreateTable

table = Table(
    "t1",
    MetaData(),
    Column("id", Integer, primary_key=True),
    Column("small", SmallInteger),
    Column("medium", Integer),
    Column("large", BigInteger),
    awsathena_location="s3://bucket",
)

def executor(_1, _2, _3):
    pass

engine = create_mock_engine("awsathena+rest:///", executor)
ddl = CreateTable(table).compile(bind=engine)

print(ddl)

Which produces:

CREATE EXTERNAL TABLE t1 (
        id INT,
        small INT,
        medium INT,
        large INT
)
LOCATION 's3://bucket/'

But should produce:

CREATE EXTERNAL TABLE t1 (
        id INT,
        small SMALLINT,
        medium INT,
        large BIGINT
)
LOCATION 's3://bucket/'

Origin of the problem seems to be here.

SQLAlchemy 2.0.27.

laughingman7743 commented 4 months ago

All are INT types. I will check this weekend.

logorrheic commented 4 months ago

Sorry, my terminology may be unhelpful, I was meaning smallint and bigint are variants of integer.

laughingman7743 commented 4 months ago

No problem. I think it is an issue to handle everything with INT type.