twitchscience / kinsumer

Native Go consumer for AWS Kinesis streams.
Other
134 stars 35 forks source link

how to define initial position in stream #31

Open jney opened 5 years ago

jney commented 5 years ago

Hi. I'd like to define initial position in stream, but I don't see how to define such parameter in kinsumer.NewConfig() Thank you

garethlewin commented 5 years ago

I never implemented this option as we have raised it but never needed it. Can you help me understand your exact use case? I've considered starting from the latest (as opposed to the oldest), but if you want to start mid stream would you want to take an iterator, a timestamp or what?

jney commented 5 years ago

The point is, I did some tests, and, from time to time, the first .Next() call took very long (more than 5 minutes). I try to understand what could be the cause, and, one of the possibilities I envisioned, is that kinsumer was reading the whole queue to find the record declared in the checkpoint. So what I meant is that I don't have a specific use case, but more, an issue.

jney commented 4 years ago

hello @GarethLewin, I saw this https://github.com/twitchscience/kinsumer/blob/2570b35fbf8d1cba018c3cf6bd4aff53bd4509a2/shard_consumer.go#L39 but I still don't see how to specify it

colmsnowplow commented 3 years ago

👋 I have a use case for which this feature would be very useful (description of that use case below).

~Looks like there's a PR for it - is there any likelihood that it might get updated merged?~

Edit: I see now that something like the behaviour I'm looking for exists, but requires manually editing DDB entries. So, I'll rephrase my question - is there any interest in something like a config flag to start at latest, as suggested here?

At a glance feels like it's just a matter of adding it to the config and passing it as an extra argument to getShardIterator(), then modifying behaviour when sequenceNumber is empty.

If that sounds about right I'm happy to make a PR. :)

garethlewin commented 3 years ago

I no longer work at Twitch so I do not own this repo anymore, so the new owners might have a different view on this.

The problem with using latest is a race condition when shards split. It makes sense at startup when you see a shard for the first time, but when your program has been running and consuming data and a new shard is added to the stream, reading from latest (if that is the default for a stream never seen before) will miss some records on the new shard.

I think maybe a better alternative is to use Timestamp in GetShardIterator, it didn't exist when I originally wrote kinsumer. That way you can set it to some timestamp that matches when the application came back up.

colmsnowplow commented 3 years ago

Yes good point - when I wrote my last comment I hadn't thought of this. I think you're likely right, using timestamp is probably the least complicated approach.

Very much appreciate your input @garethlewin. I'll give this some thought and see what I can come up with. :)