opensearch-project / data-prepper

Data Prepper is a component of the OpenSearch project that accepts, filters, transforms, enriches, and routes data at scale.
https://opensearch.org/docs/latest/clients/data-prepper/index/
Apache License 2.0
256 stars 188 forks source link

SNS as Sink #2938

Open ashoktelukuntla opened 1 year ago

ashoktelukuntla commented 1 year ago

Is your feature request related to a problem? Please describe.

Pipeline users want to send message to an Amazon Simple Notification Service (SNS) topic.

Describe the solution you'd like

Create a new sink in Data Prepper which outputs data to topic using codec

sink:
    - sns:
        topic_name:  "mytopic"
        id: << String>> 
        aws:
            region: us-east-1
            sts_role_arn: "arn:aws:sns:us-east-1:1234567:hello"
          codec:
              ndjson:

Additional context

dlvenable commented 1 year ago

Support FIFO queues would be good. And it would require that we allow users to configure both the de-duplication Id and the message group Id. These should be configurable either as string literals or as variables within an event. The de-duplication Id can also be a random string by default, but users must be able to configure this.

Here is an example. Say my events have a couple of properties - a "type" and an "id". I want any messages of the same type to be in order. So I'd like the type value for each event to define the message group Id. Then my deduplication Id is set to the "id" key.

sink:
  - sns:
       topic_name:  "mytopic"
       message_deduplication_id: "${/id}"
       message_group_id: "${/type}"

The ${/id} syntax tells Data Prepper to get the value of the key named "id".

dlvenable commented 1 year ago

It also would be good to consider using the PublishBatch API to reduce the number of API calls. This API can accept up to 10 messages in a single call.

Perhaps add a new parameter: batch_size which takes an integer value. It must be restricted to be between 1-10 inclusive.

sink:
  - sns:
       topic_name: my_topic
       batch_size: 10