S3 buffer using pipeline transformations

Is your feature request related to a problem? Please describe.

For workloads that are smaller and want durability, using S3 as a buffer can be a good solution.

Describe the solution you'd like

Data Prepper already has a few things that we can combine to create an S3 buffer.

An S3 source
An S3 sink
Pipeline transformations

I propose that we have a new buffer - pipeline_s3 which is implemented only as a pipeline transformation.

my-pipeline:
  source:
    http:
  buffer:
    pipeline_s3:
      bucket: mybucket
  sink:
    - opensearch:

This would transform into:

my-pipeline-source:
  source:
    http:
  buffer:
    bounded_blocking:
  sink:
    - s3:
        bucket: mybucket

my-pipeline-sink:
  source:
    s3:
      scan:
        buckets:
          - bucket:
               name: mybucket
  buffer:
    bounded_blocking:
  sink:
    - opensearch:

Describe alternatives you've considered (Optional)

We could implement an S3 buffer similar to the Kafka buffer that does not require splitting the pipeline. But, creating this would be quite a bit faster.

Also, I think we should leave room for a possible S3 buffer that is implement. My proposal is to alter the name of this buffer to make it distinct from an S3 buffer. And also to avoid confusing with other buffers such as Kafka. Thus, I called this pipeline_s3.

One alternative to changing the name is to use a flag instead - split_pipeline: true or asynchronous_buffer: true.

Additional context

N/A

opensearch-project / data-prepper

S3 buffer using pipeline transformations #4809