twitchscience / kinsumer

Native Go consumer for AWS Kinesis streams.
Other
134 stars 35 forks source link

added support for deaggregation for aggregated records #62

Open psanzay opened 3 years ago

psanzay commented 3 years ago

This PR enables support for deaggregation of the records if the stream has aggregated records.

garethlewin commented 3 years ago

Hi Sorry I haven't been ignoring this, I'm just at a bit of a analysis paralysis option here.

This change would make #49 very difficult (or more accurately #49 makes this more difficult). I am really also not sure how to handle erroneous situations.

As I see it there are 3 options, and I dislike all 3:

A) On error just send in the entire blob, this means clients now have to anticipate this happening and deal with the situation, which means that they have to be aware of deaggregation.

B) On error swallow the record. This means data will be dropped, this seems very bad.

C) On error return an error from kinsumer and error. The problem with this is that a checkpoint won't be created (or we are basically back to option B) ) and thus kinsumer will never be able to handle that shard again until the record expires off it.

I am wondering what the benefits of implicit deaggregation are here vs having the clients do it on their side (which is what we do at Twitch, but then we use our own aggregation method and not the one that KCL supplies.