laughingman7743 / PyAthena

PyAthena is a Python DB API 2.0 (PEP 249) client for Amazon Athena.
MIT License
456 stars 102 forks source link

GENERIC_INTERNAL_ERROR: io.trino.hdfs.s3.TrinoS3FileSystem$UnrecoverableS3OperationException #534

Closed laughingman7743 closed 2 months ago

laughingman7743 commented 2 months ago

https://github.com/laughingman7743/PyAthena/actions/runs/8677531515/job/23793302369#step:6:401

Syncing environment plugin requirements
Creating environment: test.py3.12
Installing project in development mode
Checking dependencies
Syncing dependencies
============================= test session starts ==============================
platform linux -- Python 3.12.2, pytest-8.0.0, pluggy-1.4.0
rootdir: /home/runner/work/PyAthena/PyAthena
configfile: pyproject.toml
plugins: anyio-4.2.0, cov-4.1.0, dependency-0.6.0, xdist-3.5.0
created: 8/8 workers
8 workers [502 items]

........................................................................ [ 14%]
........................................................................ [ 28%]
........................................................................ [ 43%]
............................................................F........... [ 57%]
.......s................................................................ [ 71%]
........................................................................ [ 86%]
.........................................................F.........F..   [100%]
=================================== FAILURES ===================================
______________________ TestArrowCursor.test_iceberg_table ______________________
[gw4] linux -- Python 3.12.2 /home/runner/.local/share/hatch/env/pip-compile/pyathena/lr5GePrj/test.py3.12/bin/python

self = <tests.pyathena.arrow.test_cursor.TestArrowCursor object at 0x7f83c3e06510>
arrow_cursor = <pyathena.arrow.cursor.ArrowCursor object at 0x7f83a0c7f0b0>

    def test_iceberg_table(self, arrow_cursor):
        iceberg_table = "test_iceberg_table_arrow_cursor"
        arrow_cursor.execute(
            f"""
            CREATE TABLE {ENV.schema}.{iceberg_table} (
              id INT,
              col1 STRING
            )
            LOCATION '{ENV.s3_staging_dir}{ENV.schema}/{iceberg_table}/'
            tblproperties('table_type'='ICEBERG')
            """
        )
>       arrow_cursor.execute(
            f"""
            INSERT INTO {ENV.schema}.{iceberg_table} (id, col1)
            VALUES (1, 'test1'), (2, 'test2')
            """
        )

tests/pyathena/arrow/test_cursor.py:552: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <pyathena.arrow.cursor.ArrowCursor object at 0x7f83a0c7f0b0>
operation = "\n            INSERT INTO pyathena_test_s2n7hrqcpm.test_iceberg_table_arrow_cursor (id, col1)\n            VALUES (1, 'test1'), (2, 'test2')\n            "
parameters = None, work_group = None, s3_staging_dir = None, cache_size = 0
cache_expiration_time = 0, result_reuse_enable = None
result_reuse_minutes = None, kwargs = {}, unload_location = None
query_execution = <pyathena.model.AthenaQueryExecution object at 0x7f83af997110>

    def execute(
        self,
        operation: str,
        parameters: Optional[Dict[str, Any]] = None,
        work_group: Optional[str] = None,
        s3_staging_dir: Optional[str] = None,
        cache_size: Optional[int] = 0,
        cache_expiration_time: Optional[int] = 0,
        result_reuse_enable: Optional[bool] = None,
        result_reuse_minutes: Optional[int] = None,
        **kwargs,
    ) -> ArrowCursor:
        self._reset_state()
        if self._unload:
            s3_staging_dir = s3_staging_dir if s3_staging_dir else self._s3_staging_dir
            assert s3_staging_dir, "If the unload option is used, s3_staging_dir is required."
            operation, unload_location = self._formatter.wrap_unload(
                operation,
                s3_staging_dir=s3_staging_dir,
                format_=AthenaFileFormat.FILE_FORMAT_PARQUET,
                compression=AthenaCompression.COMPRESSION_SNAPPY,
            )
        else:
            unload_location = None
---------------------------------------------------------
pyathena/__init__.py                     43      9    79%
pyathena/arrow/__init__.py                0      0   100%
pyathena/arrow/async_cursor.py           42      0   100%
pyathena/arrow/converter.py              32      2    94%
pyathena/arrow/cursor.py                 90      1    99%
pyathena/arrow/result_set.py            148     15    90%
pyathena/arrow/util.py                   45      3    93%
pyathena/async_cursor.py                 54      1    98%
pyathena/common.py                      291     46    84%
pyathena/connection.py                  127     32    75%
pyathena/converter.py                    81      9    89%
pyathena/cursor.py                       72      1    99%
pyathena/error.py                        21      0   100%
pyathena/fastparquet/__init__.py          0      0   100%
pyathena/fastparquet/util.py             44      3    93%
pyathena/filesystem/__init__.py           0      0   100%
pyathena/filesystem/s3.py               279     74    73%
pyathena/filesystem/s3_object.py         34      0   100%
pyathena/formatter.py                    99      5    95%
pyathena/model.py                       490     13    97%
pyathena/pandas/__init__.py               3      0   100%
pyathena/pandas/async_cursor.py          44      0   100%
pyathena/pandas/converter.py             23      0   100%
pyathena/pandas/cursor.py                97      1    99%
pyathena/pandas/result_set.py           230     27    88%
pyathena/pandas/util.py                 160      6    96%
pyathena/result_set.py                  523     97    81%
pyathena/spark/__init__.py                0      0   100%
pyathena/spark/async_cursor.py           34      5    85%
pyathena/spark/common.py                188     48    74%
pyathena/spark/cursor.py                 32      2    94%
pyathena/sqlalchemy/__init__.py           0      0   100%
pyathena/sqlalchemy/arrow.py             15     15     0%
pyathena/sqlalchemy/base.py             526     71    87%
pyathena/sqlalchemy/pandas.py            19     19     0%
pyathena/sqlalchemy/requirements.py      98     98     0%
pyathena/sqlalchemy/rest.py               4      0   100%
pyathena/sqlalchemy/types.py             32     10    69%
pyathena/sqlalchemy/util.py               3      1    67%
pyathena/util.py                         31      1    97%
---------------------------------------------------------
TOTAL                                  4054    615    85%
Coverage HTML written to dir htmlcov

=========================== short test summary info ============================
FAILED tests/pyathena/arrow/test_cursor.py::TestArrowCursor::test_iceberg_table - pyathena.error.OperationalError: GENERIC_INTERNAL_ERROR: io.trino.hdfs.s3.TrinoS3FileSystem$UnrecoverableS3OperationException: com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: ZGGSKJRM25MGNGFE; S3 Extended Request ID: sDD4r1ZhVyF7BCzMq2W9IlJdyx8wbJ7X14p/OM5b3JSYZ2hm+JbACgmZdl/MTSVpffkFS1qBDvw=; Proxy: null), S3 Extended Request ID: sDD4r1ZhVyF7BCzMq2W9IlJdyx8wbJ7X14p/OM5b3JSYZ2hm+JbACgmZdl/MTSVpffkFS1qBDvw= (Bucket: laughingman7743-pyathena, Key: github/pyathena_test_s2n7hrqcpm/test_iceberg_table_arrow_cursor/metadata/00000-2a9c2158-ce88-4397-b05d-640059bff4bf.metadata.json). If a data manifest file was generated at 's3://laughingman7743-pyathena/github/8e7ff2d2-c380-4498-8446-19048747927d-manifest.csv', you may need to manually clean the data from locations specified in the manifest. Athena will not delete data in your account.
FAILED tests/pyathena/pandas/test_cursor.py::TestPandasCursor::test_iceberg_table - pyathena.error.OperationalError: GENERIC_INTERNAL_ERROR: io.trino.hdfs.s3.TrinoS3FileSystem$UnrecoverableS3OperationException: com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: B97VDJ48EWB7EK2X; S3 Extended Request ID: 5AcjuEY4mJ6KtedJRFwO6Kn4CwzDbEVCGyamT5WN1+J8DcY8s/J2QMQWEYe/eObqFlxDRxbMK8E=; Proxy: null), S3 Extended Request ID: 5AcjuEY4mJ6KtedJRFwO6Kn4CwzDbEVCGyamT5WN1+J8DcY8s/J2QMQWEYe/eObqFlxDRxbMK8E= (Bucket: laughingman7743-pyathena, Key: github/pyathena_test_yix8slkkfz/test_iceberg_table_pandas_cursor/metadata/00000-7f53210d-df91-46e9-a986-b4bac518c2d7.metadata.json). If a data manifest file was generated at 's3://laughingman7743-pyathena/github/8cb7be60-2d0a-4edf-af37-f9db4d5e99e9-manifest.csv', you may need to manually clean the data from locations specified in the manifest. Athena will not delete data in your account.
FAILED tests/pyathena/test_cursor.py::TestCursor::test_iceberg_table - pyathena.error.OperationalError: GENERIC_INTERNAL_ERROR: io.trino.hdfs.s3.TrinoS3FileSystem$UnrecoverableS3OperationException: com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: Z5KA3H25XQA5CA0P; S3 Extended Request ID: grpWALFSwuBlfS5YnD6dYRPKIsoFESkB1mZMKWrIz3WWijM236fdFSABS43OkbTeLYbErrglnIiMf5hLkGq+Xw==; Proxy: null), S3 Extended Request ID: grpWALFSwuBlfS5YnD6dYRPKIsoFESkB1mZMKWrIz3WWijM236fdFSABS43OkbTeLYbErrglnIiMf5hLkGq+Xw== (Bucket: laughingman7743-pyathena, Key: github/pyathena_test_u0ffar7v4c/test_iceberg_table_cursor/metadata/00000-3cb864eb-fc2b-4b5f-b3e2-545bd27edcb3.metadata.json). If a data manifest file was generated at 's3://laughingman7743-pyathena/github/71bee07e-db46-4451-9bcc-02dda14013d0-manifest.csv', you may need to manually clean the data from locations specified in the manifest. Athena will not delete data in your account.
===== 3 failed, 498 passed, 1 skipped, 4482 warnings in 358.03s (0:05:58) ======
Error: Process completed with exit code 1.
laughingman7743 commented 2 months ago

https://github.com/laughingman7743/PyAthena/actions/runs/8793141948 For some reason, this error has not occurred since today. 🤔