hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.77k stars 9.12k forks source link

[Bug]: Terraform expects schema_version_id and schema_version_number, but AWS expects only schema_version_id = Failing Pipeline #34008

Open PF1o1 opened 11 months ago

PF1o1 commented 11 months ago

Terraform Core Version

1.5.7

AWS Provider Version

hashicorp/aws 5.21.0, hashicorp/archive 2.4.0

Affected Resource(s)

Affected Resource: aws_glue_catalog_table By defining only the schema_version_id in schema_reference block inside the aws_glue_catalog_table configuration according to this Hashicorp documentation, a bug during our pipeline occured.

According to the documentation it should be possible to only use the schema_version_id instead of schema_id and schema_version_number in the schema_reference block.

Due to this bug the pipeline cannot run successfully.

Expected Behavior

The correct schema and the correct schema version should be referenced successful to the specific glue table and the pipeline should run through successfully.

Actual Behavior

The pipeline fails.

Relevant Error/Panic Output Snippet

The error log caused by the terraform error:

│ Error: Missing required argument
│ 
│   on ../../modules/glue/table/main.tf line 101, in resource "aws_glue_catalog_table" "test_table_linking_issue":
│  101:     schema_reference {
│ 
│ The argument "schema_version_number" is required, but no definition was
│ found.
╵

___
___

When trying to provide both schema_version_id and schema_version_number this error occurs caused by AWS:

Error: updating Glue Catalog Table (771887822597:testing_data_lake:test_table_linking_issue): InvalidInputException: No other input parameters can be specified when fetching by SchemaVersionId.
│ {
│   RespMetadata: {
│     StatusCode: 400,
│     RequestID: "064a8971-b702-4d59-a005-cf765486f5f0"
│   },
│   Message_: "No other input parameters can be specified when fetching by SchemaVersionId."
│ }
│ 
│   with module.main.module.glue.module.table.aws_glue_catalog_table.test_table_linking_issue,
│   on ../../modules/glue/table/main.tf line 85, in resource "aws_glue_catalog_table" "test_table_linking_issue":
│   85: resource "aws_glue_catalog_table" "test_table_linking_issue" {

Terraform Configuration Files

resource "aws_glue_catalog_table" "test_table_linking_issue" {

  name          = "test_table_linking_issue"
  database_name = "${var.environment}_data_lake"
  parameters = {
    classification = "parquet"
    # compressionType = "none"
    # "partition_filtering.enabled" = true
  }

  storage_descriptor {
    compressed    = true
    location      = "s3://myleodsc-${var.environment}-data-lake/delivery/"
    input_format  = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat"
    output_format = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"

    schema_reference {
      schema_version_id = "fa15ea6f-751c-4462-982b-ee236acdb20a"
      schema_version_number =  var.test_schema.latest_schema_version
    }

    ser_de_info {
      name                  = "parquet"
      serialization_library = "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"

      parameters = {
        "serialization.format" = 1
      }
    }
  }
}

Steps to Reproduce

  1. Create aws glue schema registry
  2. Create aws glue schema
  3. Create aws glue datacatalog database
  4. Create aws glue catalog table and reference to created schema like in terraform configuration files.

Debug Output

No response

Panic Output

No response

Important Factoids

No response

References

An issue maybe relating to the same general terraform problem might be this one: https://github.com/hashicorp/terraform-provider-aws/issues/25774

Generally there is not a perfect connection between aws glue schemas and aws glue tables when using terraform. Example problems:

Some pictures to the explained problem when comparing the connection between schema and table depending on their creation inside terraform or inside aws console:

Created in Terraform:

image

Created in aws glue console:

image

I highly recommend talking a closer look at this issue as I believe that those mentioned missing functionalities above are created by a weak or missing link between schemas and tables.

Would you like to implement a fix?

None

github-actions[bot] commented 11 months ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue