rehanvdm / serverless-website-analytics

A CDK construct that consists of a serverless backend, frontend and client side code to track website analytics
GNU General Public License v2.0
167 stars 13 forks source link

Experiment Glue Table Statistics #67

Closed rehanvdm closed 9 months ago

rehanvdm commented 9 months ago

Can speed up the planning phase of Athena queries. But could not get it to work. See branch https://github.com/rehanvdm/serverless-website-analytics/tree/feature/experiment-column-level-statistics

Created a gue table from the a v2 of the Glue external table for experimentation manually, but it did not want to run:

I am trying to use the new Glue Table Statistics https://aws.amazon.com/blogs/big-data/enhance-query-performance-using-aws-glue-data-catalog-column-level-statistics/ but I get an error: Exception in User Class: java.lang.AssertionError : assertion failed: Conflicting partition column names detected: .Which does not help me at all :sweat_smile: Table details in the screenshots. I wonder if it is because I am using partition projection, but the limitations does not mention anything about it :thinking_face: Anyone that can help me out? I don't want to/can't open a support ticket for this, it's for an open-source project: serverless-website-analytics

Not even selecting all columns individually except the partition columns worked, still got that error

rehanvdm commented 9 months ago

Closing for now, will revisit once it's working. Think this is a limitation of the service