snowplow / snowbridge

For replicating streams across clouds, accounts and regions
Other
15 stars 7 forks source link

Add start timestamp for Kinesis source #72

Closed colmsnowplow closed 3 years ago

colmsnowplow commented 3 years ago

Currently, when we first deploy the app with a kinesis source, we must process the entire history of the source's retention period.

Kinsumer almost includes a mechanism for managing this - but it would involve manually creating entries in dynamoDB in a way that isn't really practicable for deployment of an app like ours.

Options for introducing something more amenable to our requirements have been discussed here. It would involve updating our fork of the kinsumer library with this configuration change, then updating this project to suit.

colmsnowplow commented 3 years ago

This feature is particularly needed in light of https://github.com/snowplow-devops/stream-replicator/issues/73, since processing the entire backlog of a stream, if that stream is high volume, makes it very likely that we encounter hanging instances on first deploy.

Additionally many targets and use cases won't be suited to receiving this much data, and it's needlessly expensive to process all that data.