pulumi / pulumi-aws

An Amazon Web Services (AWS) Pulumi resource package, providing multi-language access to AWS
Apache License 2.0
465 stars 157 forks source link

AWS Glue Iceberg tables schema can't be updated with Pulumi #4618

Open stonefishy opened 1 month ago

stonefishy commented 1 month ago

I am using pulumi to create a AWS glue table with iceberg format. The iceberg metadata is created, I can insert data into the table and select the data which inserted through AWS Athena. Below is glue iceberg table created by python code.

create glue iceberg table

import pulumi
import pulumi_aws as aws

pulumi_external_table_test = aws.glue.CatalogTable(
    "pulumi_external_table_test",
    database_name="pulumi_database_test",
    name="pulumi_external_table_test",
    storage_descriptor=aws.glue.CatalogTableStorageDescriptorArgs(
        additional_locations=["s3://xxxx/pulumi_external_table_test/data"],
        columns=[
            aws.glue.CatalogTableStorageDescriptorColumnArgs(
                name="test1",
                type="string",
            ),
            aws.glue.CatalogTableStorageDescriptorColumnArgs(
                name="test2",
                type="string",
            ),
            aws.glue.CatalogTableStorageDescriptorColumnArgs(
                name="test3",
                type="boolean",
            ),
            aws.glue.CatalogTableStorageDescriptorColumnArgs(
                name="test4",
                type="string",
            )
        ],
        location="s3://xxx/pulumi_external_table_test",
    ),
    table_type="EXTERNAL_TABLE",
    open_table_format_input=aws.glue.CatalogTableOpenTableFormatInputArgs(
        iceberg_input=aws.glue.CatalogTableOpenTableFormatInputIcebergInputArgs(
            metadata_operation="CREATE"
        )
    ),
    opts=pulumi.ResourceOptions(protect=False)
)

I added a new column 'test5' in the code, and I can see the 'test5' column in the table schema after pulumi up. But I faced the error when insert the data into the table in Athena.

Add new column 'test5' in glue iceberg table

import pulumi
import pulumi_aws as aws

pulumi_external_table_test = aws.glue.CatalogTable(
    "pulumi_external_table_test",
    database_name="pulumi_database_test",
    name="pulumi_external_table_test",
    storage_descriptor=aws.glue.CatalogTableStorageDescriptorArgs(
        additional_locations=["s3://xxx/pulumi_external_table_test/data"],
        columns=[
            aws.glue.CatalogTableStorageDescriptorColumnArgs(
                name="test1",
                type="string",
            ),
            aws.glue.CatalogTableStorageDescriptorColumnArgs(
                name="test2",
                type="string",
            ),
            aws.glue.CatalogTableStorageDescriptorColumnArgs(
                name="test3",
                type="boolean",
            ),
            aws.glue.CatalogTableStorageDescriptorColumnArgs(
                name="test4",
                type="string",
            ),
            aws.glue.CatalogTableStorageDescriptorColumnArgs(
                name="test5",
                type="string",
            )
        ],
        location="s3://xxx/pulumi_external_table_test",
    ),
    table_type="EXTERNAL_TABLE",
    open_table_format_input=aws.glue.CatalogTableOpenTableFormatInputArgs(
        iceberg_input=aws.glue.CatalogTableOpenTableFormatInputIcebergInputArgs(
            metadata_operation="CREATE"
        )
    ),
    opts=pulumi.ResourceOptions(protect=False)
)

Table schema is updated

image

Athena SQL:

insert into pulumi_database_test.pulumi_external_table_test(test1,test2,test3,test4, test5) values('1b', '2b', true, '4b', '5b')

Get the error when executing above SQL in Athena

Image

Checking the metadata json in s3 bucket, the metadata not updated with new column test5.

Image

Checking the API docs for iceberg_input metadata_operation, the value only can be CREATE. It seems only support creating iceberg metadata file while creating glue table. The iceberg metadata can not be updated when updating glue table schema by using pulumi.

Image

Also find the same issue from Terraform site: https://github.com/hashicorp/terraform-provider-aws/issues/36641

justinvp commented 1 month ago

Transferring to https://github.com/pulumi/pulumi-aws

corymhall commented 1 month ago

@stonefishy thanks for doing the research and finding the upstream issue! It looks like the upstream issue has a lot of upvotes so hopefully it will get addressed. Once upstream fixes this we will automatically get the fix in the next release.