hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.76k stars 9.11k forks source link

Feature Request: Manage Record Format Conversion In AWS Kinesis Firehose Stream #4510

Closed vladholubiev closed 6 years ago

vladholubiev commented 6 years ago

Community Note

Description

AWS release a feature today - convert JSON from Kinesis Firehose Stream to Apache Parquet or Apache ORC before saving to S3.

Before you needed to write and pay for AWS Glue ETL jobs to do that.

Documentation: https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.html

The relevant Stackoverflow question has ~2500 views meaning it was a long-awaited feature.

New or Affected Resource(s)

Potential Terraform Configuration

Suggested syntax? According to the API.

resource "aws_kinesis_firehose_delivery_stream" "test_stream" {
  name        = "terraform-kinesis-firehose-test-stream"
  destination = "s3"

  data_format_conversion {
    enabled = "true"

    input_format_configuration {
      deserializer = "Apache Hive JSON" # or OpenX JSON
    }

    output_format_configuration {
      serializer = "ORC" # or Parquet
    }

    schema_configuration {
      catalog_id    = "${aws_glue_catalog_database.main.catalog_id}"
      database_name = "${aws_glue_catalog_database.main.name}"
      table_name    = "${aws_glue_catalog_table.main.name}"
      role_arn      = "..."
      version_id    = "3" # or LATEST by default
    }
  }
}

References

bflad commented 6 years ago

Prerequisite: AWS Go SDK v1.13.47 (#4512)

gregburek commented 6 years ago

I'm +1 on the proposed configuration syntax here as an mvp, however the input_format_configuration and output_format_configuration sections expose many more knobs:

input: https://docs.aws.amazon.com/firehose/latest/APIReference/API_HiveJsonSerDe.html https://docs.aws.amazon.com/firehose/latest/APIReference/API_OpenXJsonSerDe.html

output: https://docs.aws.amazon.com/firehose/latest/APIReference/API_ParquetSerDe.html https://docs.aws.amazon.com/firehose/latest/APIReference/API_OrcSerDe.html

It appears that most of these are optional, but will be returned in DescribeDeliveryStream: https://docs.aws.amazon.com/firehose/latest/APIReference/API_DescribeDeliveryStream.html

bflad commented 6 years ago

I will try to get a pull request submitted for this tomorrow or Wednesday.

bflad commented 6 years ago

Ack -- I only got about halfway through implementing the 36(!) new attributes required in the full schemas for serializers/deserializers before I ran out of time before I head out on a short vacation. I'll be able to pick this back up on Tuesday unless someone wants to get something in sooner.

bflad commented 6 years ago

Pull request submitted with all underlying options: #4842

bflad commented 6 years ago

Support has been merged into master and will release with version 1.24.0 of the AWS provider, likely middle of this week. 🎉

bflad commented 6 years ago

This has been released in version 1.24.0 of the AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

ghost commented 4 years ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!