Open arssatavares opened 4 months ago
Voting for Prioritization
Volunteering to Work on This Issue
Good day Just wondering if there is any progress on this issue as we are facing the same issue using TF 1.6.6 with AWS 5.32.1, also if I try version 5.16.0 for aws I get another issue
We have the same issue. The bug was introduced with a fix for iceberg tables by @ewbankkit, which we also need.
We were able to work around the issue by explicitly setting the catalog_id
argument in the resource. However, this still feels like a regression and should be addressed.
Terraform Core Version
1.6.6
AWS Provider Version
5.38.0,5.16.1, 5.52.0
Affected Resource(s)
resource "aws_glue_catalog_table"
Expected Behavior
The following action should run with no blockers:
resourceCatalogTableRead is able to read partition index when the table does not belong to the account's Data Catalog.
Actual Behavior
This issue is currently blocking us since version 5.16.1.
resourceCatalogTableRead is having some trouble reading table's partition index when the table does not belong to the account's Data Catalog.
Currently this is blocking any operations which imply reading the table:
We find out the issue is related with tables which resides in metadata database which does not belong to our account. Through Lake Formation we're able to share data lakes across accounts and each of our our databases are resource-links to a shared database in other account (and catalog id). Tables are successfully created and we can check them on the correct dataset - we're able to run queries and jobs on them.
The error is raised during the action GetPartitionIndexesWithContext since it is trying to get the table from a catalog ID equals to the current account ID but the table resides in the catalog id of the external account.
This was introduced with a fix regarding "_removal of metadata_location and tabletype when updating Iceberg tables" (https://github.com/hashicorp/terraform-provider-aws/issues/33374)
We strongly believe the issue is caused by the following changes:
Before, the action GetPartitionIndexesWithContext was taking as input the catalogID, dbName and name from the output of FindTableByName - which is our table. However, with the new code, since we're using the current account ID as catalog ID, we cannot find the table.
GetPartitionIndexesInput retrieves the partition indexes associated with a table and requires a database-name and a table-name. It is also possible to give it as input the catalog_id. Currently we're taking as input the current account as catalog_id:
aws glue get-partition-indexes --database-name <MyCatalogDatabase> --table-name <MyCatalogTable> --catalog-id <OWN_ACCOUNT_ID>
However if we pass the catalog_id of the original account it runs successfully
aws glue get-partition-indexes --database-name <MyCatalogDatabase> --table-name <MyCatalogTable> --catalog-id <ORIGINAL_CATALOG_ID>
This can also be observed on the debug output logs (you can check the complete logs on the section above):
{"CatalogId":"663370797934","DatabaseName":"<DATABASE_NAME>","Name":"14465_<TABLE_NAME>"}
{"Table":{"CatalogId":"964290010106","CreateTime":1.708699485E9,"CreatedBy":"arn:aws:sts::663370797934:assumed-role/<ROLE_NAME>/terraform","DatabaseId":"bfc321ab80664a4bae12946acdc9e655","DatabaseName":"<DATABASE_NAME>","IsMultiDialectView":false,"IsRegisteredWithLakeFormation":false,"IsRowFilteringEnabled":false,"Name":"14465_<TABLE_NAME>","PartitionKeys":[],"Retention":0,"StorageDescriptor":{"BucketColumns":[],"Columns":[],"Compressed":false,"InputFormat":"","Location":"s3://scdl-dev-source/14465_<TABLE_NAME>/","NumberOfBuckets":0,"OutputFormat":"","Parameters":{},"SortColumns":[],"StoredAsSubDirectories":false},"UpdateTime":1.708699485E9,"VersionId":"0"}}
{"CatalogId":"663370797934","DatabaseName":"<DATABASE_NAME>","TableName":"14465_<TABLE_NAME>"}
Relevant Error/Panic Output Snippet
Terraform Configuration Files
resource "aws_glue_catalog_table" "semantic_table" {
name = var.asset_name database_name = var.dataset_config.dataset_id description = local.schema["description"] owner = var.global_config.project_name table_type = "EXTERNAL_TABLE"
parameters = local.table_properties["table_parameters"]
dynamic "partition_keys" { for_each = can(local.schema["partition_keys"]) == true ? local.schema["partition_keys"] : []
}
storage_descriptor { input_format = local.table_properties["input_format"] location = local.table_properties["location"] output_format = local.table_properties["output_format"]
} }
Steps to Reproduce
You order to proceed to this test, you may follow the next steps:
aws glue get-partition-indexes --database-name <MyCatalogDatabase> --table-name <MyCatalogTable> --catalog-id <OWN_ACCOUNT_ID>
which will fail with
An error occurred (EntityNotFoundException) when calling the GetPartitionIndexes operation: Table image_metadata not found.
And then you can run:
aws glue get-partition-indexes --database-name <MyCatalogDatabase> --table-name <MyCatalogTable> --catalog-id <ORIGINAL_CATALOG_ID>
which outputs:
{ "PartitionIndexDescriptorList": [] }
Debug Output
Panic Output
No response
Important Factoids
No response
References
34132 is a similar issue
Would you like to implement a fix?
As said before, I believe the changes on Fix removal of Parameters when updating Iceberg table Pull Request caused this issue.
Ths input for GetPartitionIndexesInput changes as you can see here:![image](https://github.com/hashicorp/terraform-provider-aws/assets/80547036/110b8e6b-4ee6-4a2c-b7c4-e2cce03715ad)
The suggestion would be to revert this change and get the CatalogId from the output of the previous action: GetTable