If I get the Table DDL, and create a secondary table directly from the AWS Console, I'm able to retrieve the data successfully. Which brings me to the conclusion this is a bug.
This screenshot shows the output from the table created by Terraform:
This screenshot shows the output from the table created directly from the AWS console, but using the "Generate DDL" from the TF table:
Actual Behavior
It is returning wrong data when querying the table from the AWS Athena console:
This configuration block creates an Athena table that pulls data from an existing S3 bucket.
resource "aws_glue_catalog_table" "aws_glue_catalog_table" {
count = terraform.workspace == "test" ? 1 : 0
name = "sms_events"
database_name = aws_glue_catalog_database.aws_glue_catalog_database[count.index].name
owner = "hadoop"
table_type = "EXTERNAL_TABLE"
storage_descriptor {
location = "s3://${module.kinesis_delivery_stream_s3_bucket[count.index].s3_bucket_id}/"
input_format = "org.apache.hadoop.mapred.TextInputFormat"
output_format = "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
compressed = false
number_of_buckets = -1
ser_de_info {
serialization_library = "org.apache.hive.hcatalog.data.JsonSerDe"
parameters = {
"serialization.format" = 1
}
}
skewed_info {
skewed_column_names = []
skewed_column_values = []
skewed_column_value_location_maps = {}
}
columns {
name = "event_type"
type = "string"
comment = "from deserializer"
}
columns {
name = "event_timestamp"
type = "string"
comment = "from deserializer"
}
columns {
name = "arrival_timestamp"
type = "string"
comment = "from deserializer"
}
columns {
name = "event_version"
type = "string"
comment = "from deserializer"
}
columns {
name = "application"
type = "struct<app_id:string,sdk:string>"
comment = "from deserializer"
}
columns {
name = "client"
type = "struct<client_id:string>"
comment = "from deserializer"
}
columns {
name = "device"
type = "struct<platform:string>"
comment = "from deserializer"
}
columns {
name = "session"
type = "string"
comment = "from deserializer"
}
columns {
name = "attributes"
type = "struct<sender_request_id:string,campaign_activity_id:string,origination_phone_number:string,destination_phone_number:string,record_status:string,iso_country_code:string,treatment_id:bigint,number_of_message_parts:bigint,message_id:string,message_type:string,campaign_id:string>"
comment = "from deserializer"
}
columns {
name = "metrics"
type = "struct<price_in_millicents_usd:string>"
comment = "from deserializer"
}
columns {
name = "awsaccountid"
type = "bigint"
comment = "from deserializer"
}
}
partition_keys {
name = "datehour"
type = "string"
}
parameters = {
EXTERNAL = "TRUE"
"projection.datehour.type" = "date"
"projection.datehour.range" = "2022/02/01/00,NOW"
"projection.datehour.format" = "yyyy/MM/dd/HH"
"projection.datehour.interval" = "1"
"projection.datehour.interval.unit" = "HOURS"
"projection.enabled" = "true"
"storage.location.template" = "s3://${module.kinesis_delivery_stream_s3_bucket[count.index].s3_bucket_id}/$${datehour}/"
}
}
Steps to Reproduce
Create a new S3 bucket in the following format (to simulate Kinesis data injection for SMS Events): s3://s3bucketname/2023/06/14/00/pinpoint-firehose-http-delivery-stream-test-6-2023-06-14-00-01-35-ab06043f-162b-4942-a470-524da1097bb5.gz
Create a new aws_glue_catalog_database
resource "aws_glue_catalog_database" "example" {
name = "example"
create_table_default_permission {
permissions = ["SELECT"]
principal {
data_lake_principal_identifier = "IAM_ALLOWED_PRINCIPALS"
}
}
}
Create a new aws_glue_catalog_table
resource "aws_glue_catalog_table" "example" {
name = "sms_events"
database_name = aws_glue_catalog_database.example.name
owner = "hadoop"
table_type = "EXTERNAL_TABLE"
storage_descriptor {
location = "s3://s3bucketname/"
input_format = "org.apache.hadoop.mapred.TextInputFormat"
output_format = "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
compressed = false
number_of_buckets = -1
ser_de_info {
serialization_library = "org.apache.hive.hcatalog.data.JsonSerDe"
parameters = {
"serialization.format" = 1
}
}
skewed_info {
skewed_column_names = []
skewed_column_values = []
skewed_column_value_location_maps = {}
}
columns {
name = "event_type"
type = "string"
comment = "from deserializer"
}
columns {
name = "event_timestamp"
type = "string"
comment = "from deserializer"
}
columns {
name = "arrival_timestamp"
type = "string"
comment = "from deserializer"
}
columns {
name = "event_version"
type = "string"
comment = "from deserializer"
}
columns {
name = "application"
type = "struct<app_id:string,sdk:string>"
comment = "from deserializer"
}
columns {
name = "client"
type = "struct<client_id:string>"
comment = "from deserializer"
}
columns {
name = "device"
type = "struct<platform:string>"
comment = "from deserializer"
}
columns {
name = "session"
type = "string"
comment = "from deserializer"
}
columns {
name = "attributes"
type = "struct<sender_request_id:string,campaign_activity_id:string,origination_phone_number:string,destination_phone_number:string,record_status:string,iso_country_code:string,treatment_id:bigint,number_of_message_parts:bigint,message_id:string,message_type:string,campaign_id:string>"
comment = "from deserializer"
}
columns {
name = "metrics"
type = "struct<price_in_millicents_usd:string>"
comment = "from deserializer"
}
columns {
name = "awsaccountid"
type = "bigint"
comment = "from deserializer"
}
}
partition_keys {
name = "datehour"
type = "string"
}
parameters = {
EXTERNAL = "TRUE"
"projection.datehour.type" = "date"
"projection.datehour.range" = "2022/02/01/00,NOW"
"projection.datehour.format" = "yyyy/MM/dd/HH"
"projection.datehour.interval" = "1"
"projection.datehour.interval.unit" = "HOURS"
"projection.enabled" = "true"
"storage.location.template" = "s3://s3bucketname/$${datehour}/"
}
}
Go to the AWS Athena console, and query that new table: select * from sms_events limit 10;
The pinpoint-firehose-http-delivery-stream-test-6-2023-06-14-00-01-35-ab06043f-162b-4942-a470-524da1097bb5.gz (json) file should look like this:
Note that if I manually create the table from the AWS Athena console, and then import the table into Terraform, everything works as expected.
Once the table has been imported to TF, the terraform plan output looks like this:
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
~ update in-place
Terraform will perform the following actions:
# aws_glue_catalog_table.aws_glue_catalog_table[0] will be updated in-place
~ resource "aws_glue_catalog_table" "aws_glue_catalog_table" {
id = "666:example:sms_events"
name = "sms_events"
~ parameters = {
- "transient_lastDdlTime" = "1686715211" -> null
# (8 unchanged elements hidden)
}
# (6 unchanged attributes hidden)
# (2 unchanged blocks hidden)
}
Plan: 0 to add, 1 to change, 0 to destroy.
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
Volunteering to Work on This Issue
If you are interested in working on this issue, please leave a comment.
If this would be your first contribution, please review the contribution guide.
Terraform Core Version
1.5.0
AWS Provider Version
v5.3.0
Affected Resource(s)
aws_glue_catalog_table
Expected Behavior
When querying the table from the AWS Athena console, it should return data. But instead, it returns the following:
If I get the Table DDL, and create a secondary table directly from the AWS Console, I'm able to retrieve the data successfully. Which brings me to the conclusion this is a bug.
This screenshot shows the output from the table created by Terraform:
This screenshot shows the output from the table created directly from the AWS console, but using the "Generate DDL" from the TF table:
Actual Behavior
It is returning wrong data when querying the table from the AWS Athena console:
Relevant Error/Panic Output Snippet
No response
Terraform Configuration Files
This configuration block creates an Athena table that pulls data from an existing S3 bucket.
Steps to Reproduce
s3://s3bucketname/2023/06/14/00/pinpoint-firehose-http-delivery-stream-test-6-2023-06-14-00-01-35-ab06043f-162b-4942-a470-524da1097bb5.gz
aws_glue_catalog_database
aws_glue_catalog_table
select * from sms_events limit 10;
The
pinpoint-firehose-http-delivery-stream-test-6-2023-06-14-00-01-35-ab06043f-162b-4942-a470-524da1097bb5.gz
(json
) file should look like this:Note that if I manually create the table from the AWS Athena console, and then import the table into Terraform, everything works as expected.
Once the table has been imported to TF, the
terraform plan
output looks like this:Debug Output
No response
Panic Output
No response
Important Factoids
No response
References
No response
Would you like to implement a fix?
No