laughingman7743 / PyAthena

PyAthena is a Python DB API 2.0 (PEP 249) client for Amazon Athena.
MIT License
464 stars 105 forks source link

Add default converter for timestamp with timezone #554

Closed gontzalm closed 3 months ago

gontzalm commented 4 months ago

Description

Add the

key value pair to the _DEFAULT_CONVERTERS dictionary.

Context

When using the feature #360 from the dbt-athena connector with partitioned columns of timestamp with time zone type, I am getting the error

TYPE_MISMATCH: line 41:31: Cannot apply operator: timestamp(6) with time zone = varchar(30).

This is because pyathena does not convert the timestamp with time zone type (it leaves it as str) and dbt-athena compiles the SQL where clause as

where date_trunc('day',  ts) = '2008-03-01 00:00:00.000000 UTC'
    or date_trunc('day',  ts) = '2008-03-02 00:00:00.000000 UTC'
    or date_trunc('day',  ts) = '2008-03-03 00:00:00.000000 UTC'
    or ...

With this PR, the dbt-athena connector works as expected:

[0m14:52:22.461301 [info ] [MainThread]: Running with dbt=1.8.3
[0m14:52:22.929296 [info ] [MainThread]: Registered adapter: athena=1.8.2
[0m14:52:27.645255 [info ] [MainThread]: Concurrency: 8 threads (target='dev')
[0m14:52:27.659016 [info ] [Thread-1 (]: 1 of 1 START sql incremental model dbt_gontzal_finance.fct_sales ............... [RUN]
[0m14:52:47.157724 [debug] [Thread-1 (]: dbt.adapters.athena.constants adapter: CAUGHT EXCEPTION: ICEBERG_TOO_MANY_OPEN_PARTITIONS: Exceeded limit of 100 open writers for partitions. You may need to manually clean the data at location 's3://tendam-temporary/dbt-dev/tables/a3f72242-44af-46fb-bc03-427ddf1207b4' before retrying. Athena will not delete data in your account.
[0m14:52:47.158524 [debug] [Thread-1 (]: COMPILED CODE RESULT: TOO_MANY_OPEN_PARTITIONS
[0m14:52:51.380641 [debug] [Thread-1 (]: CREATE NON-PARTIONED STAGING TABLE: "awsdatacatalog"."dbt_gontzal_finance"."fct_sales__tmp_not_partitioned"
[0m14:53:20.604253 [debug] [Thread-1 (]: SQL status: OK 66952030 in 29.0 seconds
[0m14:53:20.631700 [debug] [Thread-1 (]: PARTITIONED KEYS: date_trunc('day', ts)
[0m14:53:24.235316 [debug] [Thread-1 (]: SQL status: OK -1 in 4.0 seconds
[0m14:53:26.113469 [debug] [Thread-1 (]: TOTAL PARTITIONS TO PROCESS: 5997
[0m14:53:26.117918 [debug] [Thread-1 (]: BATCHES TO PROCESS: 60
[0m14:53:26.846422 [debug] [Thread-1 (]: BATCH PROCESSING: 1 OF 60
[0m14:53:27.198886 [debug] [Thread-1 (]: On model.tendam.fct_sales: -- /* {"app": "dbt", "dbt_version": "1.8.3", "profile_name": "tendam", "target_name": "dev", "node_id": "model.tendam.fct_sales"} */
        ...
              where date_trunc('day', ts)=TIMESTAMP'2008-03-01 00:00:00+00:00' or date_trunc('day', ts)=TIMESTAMP'2008-03-02 00:00:00+00:00'
        ...
[0m14:53:37.097085 [debug] [Thread-1 (]: SQL status: OK 627918 in 10.0 seconds
[0m14:53:37.099683 [debug] [Thread-1 (]: BATCH PROCESSING: 2 OF 60
laughingman7743 commented 4 months ago

https://github.com/laughingman7743/PyAthena/blob/master/tests/pyathena/test_cursor.py#L410-L506 We will need to add tests for SELECTing timestamp with time zone types.

laughingman7743 commented 4 months ago

Add python-dateutil as a dependency.

[project]
name = "PyAthena"
description = "Python DB API 2.0 (PEP 249) client for Amazon Athena"
authors = [
    {name = "laughingman7743", email = "laughingman7743@gmail.com"},
]
dependencies = [
    "boto3>=1.26.4",
    "botocore>=1.29.4",
    "tenacity>=4.1.0",
    "fsspec",
+    "python-dateutil"
]

Add types-python-date to the development environment dependencies.

[tool.hatch.envs.default]
python = "3.11"
type = "pip-compile"
lock-filename = "requirements/requirements.txt"
pip-compile-verbose = true
pip-compile-hashes = true
pip-compile-install-args = [
    "--no-deps"
]
dependencies = [
    "wheel",
    "twine",
    "sqlalchemy>=1.0.0",
    "pandas>=1.3.0",
    "numpy>=1.26.0;python_version>=\"3.9\"",
    "numpy>=1.24.0,<1.26.0;python_version<\"3.9\"",
    "pyarrow>=7.0.0",
    "fastparquet>=0.4.0",
    "Jinja2>=3.1.0",
    "mypy>=0.900",
    "pytest>=3.5",
    "pytest-cov",
    "pytest-xdist",
    "pytest-dependency",
    "ruff>=0.1.13",
    "hatch-pip-compile",
    "sphinx",
+    "types-python-dateutil"
]

The following command will update the files under the requirements directory.

$ hatch prune
$ hatch create
$ hatch create test

Commit the changes.

laughingman7743 commented 4 months ago

I think the change looks good. Please update the code format and dependencies.

$ make test                                                                                                                                                                                                                                                                                                                  2.7m  Sun Aug  4 17:26:52 2024
h
hatch run chk
cmd [1] | ruff check .
cmd [2] | ruff format --check .
73 files already formatted
cmd [3] | mypy .
Success: no issues found in 41 source files
hatch run test
================================================================================================================================================================================================ test session starts =================================================================================================================================================================================================
platform darwin -- Python 3.11.9, pytest-8.0.0, pluggy-1.4.0
rootdir: /Users/foobar/github/PyAthena
configfile: pyproject.toml
plugins: cov-4.1.0, dependency-0.6.0, xdist-3.5.0, anyio-4.2.0
8 workers [574 items]   
.............................................................................................................................................................................................................................................................................................................................................................................................................. [ 69%]
................................................................................................................................................s...............................                                                                                                                                                                                                                               [100%]
================================================================================================================================================================================================== warnings summary ==================================================================================================================================================================================================
tests/pyathena/pandas/test_cursor.py::TestPandasCursor::test_complex[pandas_cursor0-None]
tests/pyathena/pandas/test_cursor.py::TestPandasCursor::test_complex[pandas_cursor1-1000]
tests/pyathena/pandas/test_cursor.py::TestPandasCursor::test_complex[pandas_cursor2-1000000]
tests/pyathena/pandas/test_cursor.py::TestPandasCursor::test_complex_as_pandas[pandas_cursor0-None]
tests/pyathena/pandas/test_cursor.py::TestPandasCursor::test_complex_as_pandas[pandas_cursor1-1000]
tests/pyathena/pandas/test_cursor.py::TestPandasCursor::test_complex_as_pandas[pandas_cursor2-1000000]
  /Users/foobar/github/PyAthena/pyathena/pandas/result_set.py:214: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '[datetime.time(0, 0)]' has dtype incompatible with datetime64[ns], please explicitly cast to a compatible dtype first.
    df.loc[:, times] = df.loc[:, times].apply(lambda r: r.dt.time)

tests/pyathena/sqlalchemy/test_base.py: 65 warnings
  /Users/foobar/github/PyAthena/tests/pyathena/conftest.py:96: SADeprecationWarning: The dbapi() classmethod on dialect classes has been renamed to import_dbapi().  Implement an import_dbapi() classmethod directly on class <class 'pyathena.sqlalchemy.rest.AthenaRestDialect'> to remove this warning; the old .dbapi() classmethod may be maintained for backwards compatibility.
    return sqlalchemy.engine.create_engine(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform darwin, python 3.11.9-final-0 ----------
Name                                  Stmts   Miss  Cover
---------------------------------------------------------
pyathena/__init__.py                     43      9    79%
pyathena/arrow/__init__.py                0      0   100%
pyathena/arrow/async_cursor.py           42      0   100%
pyathena/arrow/converter.py              32      2    94%
pyathena/arrow/cursor.py                 90      1    99%
pyathena/arrow/result_set.py            148     14    91%
pyathena/arrow/util.py                   45      3    93%
pyathena/async_cursor.py                 54      1    98%
pyathena/common.py                      291     47    84%
pyathena/connection.py                  127     32    75%
pyathena/converter.py                    87     10    89%
pyathena/cursor.py                       72      1    99%
pyathena/error.py                        21      0   100%
pyathena/fastparquet/__init__.py          0      0   100%
pyathena/fastparquet/util.py             44      3    93%
pyathena/filesystem/__init__.py           0      0   100%
pyathena/filesystem/s3.py               581     84    86%
pyathena/filesystem/s3_object.py        315     10    97%
pyathena/formatter.py                    99      5    95%
pyathena/model.py                       490     13    97%
pyathena/pandas/__init__.py               3      0   100%
pyathena/pandas/async_cursor.py          44      0   100%
pyathena/pandas/converter.py             23      0   100%
pyathena/pandas/cursor.py                97      1    99%
pyathena/pandas/result_set.py           230     26    89%
pyathena/pandas/util.py                 160      6    96%
pyathena/result_set.py                  523     97    81%
pyathena/spark/__init__.py                0      0   100%
pyathena/spark/async_cursor.py           34      5    85%
pyathena/spark/common.py                188     48    74%
pyathena/spark/cursor.py                 32      2    94%
pyathena/sqlalchemy/__init__.py           0      0   100%
pyathena/sqlalchemy/arrow.py             15     15     0%
pyathena/sqlalchemy/base.py             526     71    87%
pyathena/sqlalchemy/pandas.py            19     19     0%
pyathena/sqlalchemy/requirements.py      98     98     0%
pyathena/sqlalchemy/rest.py               4      0   100%
pyathena/sqlalchemy/types.py             31      9    71%
pyathena/sqlalchemy/util.py               3      1    67%
pyathena/util.py                         31      1    97%
---------------------------------------------------------
TOTAL                                  4642    634    86%
Coverage HTML written to dir htmlcov

============================================================================================================================================================================== 573 passed, 1 skipped, 71 warnings in 3771.20s (1:02:51) ==============================================================================================================================================================================