opensearch-project / data-prepper

Data Prepper is a component of the OpenSearch project that accepts, filters, transforms, enriches, and routes data at scale.
https://opensearch.org/docs/latest/clients/data-prepper/index/
Apache License 2.0
238 stars 176 forks source link

Export to S3 in RDS source #4664

Closed oeyh closed 4 hours ago

oeyh commented 5 days ago

Description

Adds the basic ability to export tables from RDS to S3. Follows a similar pattern to DynamoDB/DocumentDB source that the leader node creates ExportPartition in coordination store and the node that picks up the ExportPartition will call RDS APIs to take snapshot and export it to S3.

There will be a follow-up PR to process the S3 files in the rds source. This PR only targets RDS instances, will add support for RDS/Aurora clusters in separate PRs (the RDS APIs for instances and clusters are different).

Testing

Created RDS MySQL instance as source and created a DDB table as coordination store and tested with the following pipeline config and verified that the data is exported to S3.

rds-mysql-pipeline:
  source:
    rds:
      db_identifier: "mysql-instance"
      table_names:
        - "my_db.cars"
        - "my_db.houses"
      s3_bucket: "rds-data-oeyh"
      s3_region: "us-east-1"
      s3_prefix: "rds-source-test"
      export:
        kms_key_id: xxxxxx
      aws:
        sts_role_arn: "arn:aws:iam::xxxxxxxxxxxxx:role/rdsSourcePipelineRole"
        region: "us-east-1"
  sink:
    - stdout:

Issues Resolved

Contributes to #4561

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

oeyh commented 2 days ago

Is Aurora going to supported the same source or separate source

It will be supported in this same rds source. Will add that support in a later PR.