Closed roykoand closed 1 year ago
Apologies, but this error is specific to Athena, so AWS support would be best able to assist you. My only suggestion, based on the error message, is to consider deleting the manifest.csv
file mentioned in the error message (after taking appropriate care to understand the consequences and making a backup copy).
Alternatively, you might try running Trino yourself, or using a different hosted version such as Starburst Galaxy (which has a free tier and its own support).
Hello Team,
Since 9/16 I see a very strange behaviour in AWS Athena reading ORC/Parquet files from S3. Every day since I'm receiving errors like these:
code source of the error: https://github.com/trinodb/trino/blob/071c8365faf83aaedfcda889cd2e8a28aab165fe/plugin/trino-hive/src/main/java/io/trino/plugin/hive/orc/HdfsOrcDataSource.java#L81-L87
code source of the error: https://github.com/trinodb/trino/blob/f8e774a949773399df5fa823186e4a68d79b931d/plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetPageSource.java#L192-L201
Error rate is very low, like <0.01% and mostly queries succeed after the rerun but it's annoying to have. For several reasons I'm not able to reach out to the AWS Support, so what can I do to properly debug/investigate/fix the issue? On my honest opinion, it's the AWS problem and has nothing to do with the quality of data. 2 days after (9/18) the issues started popping up AWS had 8-hours networking outage in
us-west-2
region where Athena/S3 located. I've done a variety of tests and I wasn't able to reproduce the issue.Athena engine version is v3. Additional note: usually errors appear when several processes are running and reading the same S3 bucket (not necessarily the same S3 prefix)
What are your thoughts on this? What can I do to at least properly diagnose the problem?
Thank you!