twitchscience / kinsumer

Native Go consumer for AWS Kinesis streams.
Other
134 stars 35 forks source link

Add configuration option to start from timestamp #64

Closed colmsnowplow closed 2 years ago

colmsnowplow commented 3 years ago

This PR adds a start timestamp to configuration, allowing kinsumer's iterator to use AT_TIMESTAMP to define the initial position in the stream.

It follows the design discussed in https://github.com/twitchscience/kinsumer/issues/31.

If the current maintainers have different ideas to those recommended in that thread, I am of course happy to hear and revise accordingly.

garethlewin commented 3 years ago

I personally wouldn't make this a configuration param, as those were intended to be consistent across workers, I would add it to the init chain or similar. But I'd wait for @slydon or others to voice their views.

colmsnowplow commented 3 years ago

I'm not confident that I'm not missing something here, but to explain how I landed on adding it to the config.

We can have kinsumer running in a horizontally scaling application, whereby we have little control over when instances/tasks initialise. We can also have a single instance application which gets updated periodically, forcing a reboot of the entire app.

In both of these cases we can have kinsumer initialise more than once, and there's a (very slim) chance that this happens at the same time as a new shard getting added to kinesis.

So, in that scenario anything that uses time.Now() or a timestamp relevant to initialisation is liable to miss data (but not very likely to).

I figured that the only real guarantee around that is to have the implementer provide a specific timestamp which is used in all cases, regardless of whether it's by a worker within an instance or a worker in a different instance, or a different initialisation of the app.

From there, it seemed to me that the best fit for that is to just make it part of the config, and expect the implementer to decide what the timestamp should be.

However like I said I'm not confident that I'm necessarily getting that right - I just assumed that adding to initialisation means the timestamp will be relative to initialisation.

blackknight467 commented 2 years ago

I like this implementation.