opensearch-project / opensearch-migrations

Migrate, upgrade, compare, and replicate OpenSearch clusters with ease.
https://aws.amazon.com/solutions/implementations/migration-assistant-for-amazon-opensearch-service/
Apache License 2.0
37 stars 28 forks source link

Consume S3 persisted objects as an input stream #512

Open ParvelAWS opened 8 months ago

ParvelAWS commented 8 months ago

Is your feature request related to a problem?

It would be nice if the replayer can have a built-in submodule that can read and stream S3 objects (in similar protobuf format) like the messages in Kafka topic and feed it into replay

What solution would you like?

A sub module, like S3InputStream similar to a Kafka stream reader, to feed protobuf messages into main replayer executor.

What alternatives have you considered?

An external Python or other shell scripts to feed S3 into stdin of the replayer pipe

Do you have any additional context?

Add any other context or screenshots about the feature request here.

ParvelAWS commented 8 months ago

Understood initially it may compromise the capability to pause/interrupt/resume replay offset as in Kafka topic stream. May have some workarounds like a journal tag file in S3 bucket.

sumobrian commented 2 weeks ago

This is specifically a request for capture and replay to support s3 and an input source as an alternative to Kafka.