laughingman7743 / PyAthena

PyAthena is a Python DB API 2.0 (PEP 249) client for Amazon Athena.
MIT License
464 stars 105 forks source link

Support for VACUUM - Iceberg #382

Closed nicor88 closed 2 years ago

nicor88 commented 2 years ago

What

With the introduction of Iceberg a new vacuum operation was added, but when trying to run with pyathena I run in this issue:

pyathena.error.DatabaseError: An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 1:1: mismatched input 'VACUUM'. Expecting: 'ALTER', 'ANALYZE', 'CALL', 'COMMIT', 'CREATE', 'DEALLOCATE', 'DELETE', 'DESC', 'DESCRIBE', 'DROP', 'EXECUTE', 'EXPLAIN', 'GRANT', 'INSERT', 'PREPARE', 'RESET', 'REVOKE', 'ROLLBACK', 'SET', 'SHOW', 'START', 'UNLOAD', 'UPDATE', 'USE', <query>
nicor88 commented 2 years ago

The issue seems related to the underlying boto3 libraries

nicor88 commented 2 years ago
import boto3

client = boto3.client('athena')

response = client.start_query_execution(
    QueryString='VACCUM silver.iceberg_increment',
    WorkGroup='athena_v3',
)

This snippet of code lead to the same issue above. @laughingman7743 we might need to approach the maintainers of boto3.

nicor88 commented 2 years ago

Nevermind, this is not an issue, the keywoard that I used is wrong, should be VACUUM

nicor88 commented 2 years ago

I need to re-open this, as even if using VACUUM I still get the same error.

  An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 2:5: mismatched input 'VACUUM'. Expecting: 'ALTER', 'ANALYZE', 'CALL', 'COMMENT', 'COMMIT', 'CREATE', 'DEALLOCATE', 'DELETE', 'DENY', 'DESC', 'DESCRIBE', 'DROP', 'EXECUTE', 'EXPLAIN', 'GRANT', 'INSERT', 'MERGE', 'PREPARE', 'REFRESH', 'RESET', 'REVOKE', 'ROLLBACK', 'SET', 'SHOW', 'START', 'TRUNCATE', 'UNLOAD', 'UPDATE', 'USE', 'USING', <query>
Traceback (most recent call last):
  File "/Users/nicor88/github-nicor88/lakehouse-writers/venv/lib/python3.9/site-packages/pyathena/common.py", line 494, in _execute
    query_id = retry_api_call(
  File "/Users/nicor88/github-nicor88/lakehouse-writers/venv/lib/python3.9/site-packages/pyathena/util.py", line 68, in retry_api_call
    return retry(func, *args, **kwargs)
  File "/Users/nicor88/github-nicor88/lakehouse-writers/venv/lib/python3.9/site-packages/tenacity/__init__.py", line 406, in __call__
    do = self.iter(retry_state=retry_state)
  File "/Users/nicor88/github-nicor88/lakehouse-writers/venv/lib/python3.9/site-packages/tenacity/__init__.py", line 351, in iter
    return fut.result()
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/Users/nicor88/github-nicor88/lakehouse-writers/venv/lib/python3.9/site-packages/tenacity/__init__.py", line 409, in __call__
    result = fn(*args, **kwargs)
  File "/Users/nicor88/github-nicor88/lakehouse-writers/venv/lib/python3.9/site-packages/botocore/client.py", line 515, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/Users/nicor88/github-nicor88/lakehouse-writers/venv/lib/python3.9/site-packages/botocore/client.py", line 934, in _make_api_call
    raise error_class(parsed_response, operation_name
laughingman7743 commented 2 years ago

https://docs.aws.amazon.com/athena/latest/ug/vacuum-statement.html

VACUUM is transactional and is supported only for Apache Iceberg tables in Athena engine version 3.

The VACUUM statement seems to be supported since Athena engine version 3. Please check if Athena engine version 3 is set in your workgroup configuration. You may also need to make sure that your Boto3 version is up-to-date. In that case, you can try pip install -U boto3 to update the version of boto3.

laughingman7743 commented 2 years ago

This library only calls the boto3 API. If an error occurs, there is nothing this library can do about it; you should be able to register the issue with AWS support or boto3.

nicor88 commented 2 years ago

@laughingman7743 I'm using an explicit workgroup with v3, and then I'm using the latest boto3 apis. Somehow If I run this

import boto3

client = boto3.client('athena')

response = client.start_query_execution(
    QueryString='VACCUM silver.iceberg_increment',
    WorkGroup='athena_v3',
)

all works, same if I run this

cursor = connect(
    work_group='athena_v3',
    s3_staging_dir="s3://tmp-datalake-dev-901621332207-eu-central-1/py_athena/",).cursor()
data = cursor.execute("vacuum silver.iceberg_increment")
print(cursor)

Most likely might be on dbt-athena, but still didn't find the issue...as it uses cursor.execute as specified above and works.

laughingman7743 commented 2 years ago

I see. I'll look into it this weekend.

nicor88 commented 2 years ago

@laughingman7743 it's super hood, really.

nicor88 commented 2 years ago

@laughingman7743 for full disclosure, I tried to upgrade to latest pyathena (2.15) version and latest boto3 version (1.26.16),and botocore (1.29.16) hoping that would have fix the issue, but somehow still I'm getting the issue on the StartQueryExecution, that is super weird. This is the call what we do in dbt-athena adapter.

nicor88 commented 2 years ago

Closing this one, as fixed by this and not realated to pyathena at all.