twitchscience / kinsumer

Native Go consumer for AWS Kinesis streams.
Other
134 stars 35 forks source link

Using kinsumer cross aws accounts delays/ lose the data #52

Open rhythmsharma opened 4 years ago

rhythmsharma commented 4 years ago

I am using kinsumer cross accounts and can see significant delay and sometimes lossing data while reading from Next(). in the same account as kinesis, kinsumer works fine but not when using in different aws account. As there are multiple consumers (total count: 3), I have increased the throttleDelay to 750ms but that does not help much. My usecase is to intialize the kinsumer, run and stop every 500ms once. Is this a known issue with kinsumer? any solution?

garethlewin commented 4 years ago

Hi.

At twitch we use kinsumer and kinesis across acounts a lot without any impact, the kinesis queues themselves are not hosted in your account.

There are two things I would check that I believe could cause the issues you describe

1) Cross region? Are you consuming from queues in the same region in both tests? 2) Shard counts? Are your streams configured with the same shard counts in both tests?

rhythmsharma commented 4 years ago

Hi Gareth, thank you for the response! To your questions:

  1. Consuming in the same region
  2. Consuming from the same Kinesis stream so I believe the same shard count.

My use case is to run kinsumer consumer in every 250ms and the number of records could be thousands in 1s. After one run, I listen to stream for 500ms using ticker and then stop the consumer. Again, the cycle repeats: initialize the kinsumer consumer with new config, appName but same dynamoDB tables, same kinesis connection, and same dynamo connection.

My guess of what is going wrong with kinsumer ->

As there is a check in the Run() method to verify dynamoDB tables is 'ACTIVE' state or not, I guess my full cycle is getting skipped there. DynamoDB table shows 'UPDATING' state quite often.

Is there any way to pause kinsumer consumer while reading data from a stream instead of stopping, initialize and running every time?

garethlewin commented 4 years ago

Hi.

For dynamodb tables to be UPDATING it means that they were just created. I recommend not deleting them and recreating them. Kinsumer is also not really designed to be run for 250ms and then shut off, so your testing might not be a valid test of kinsumer (or Kinesis) throughput.

If all you want to do is to test how fast reading from Kinesis is without using a store for checkpoints, I would recommend just calling the kinesis API directly.

rhythmsharma commented 4 years ago

I need to keep track of checkpoints.

garethlewin commented 4 years ago

In that case you shouldn't need to delete the dynamo tables. Those tables are where the checkpoints are stored.

rhythmsharma commented 4 years ago

Yes, checkpoints are in tables and I am not deleting those. But when there are too quick updates on dynamo tables, that is where 'UPDATING' state shows up and takes a while to come back to 'ACTIVE' state. Looks like kinsumer is best in other use cases but not in this one. Thanks for all the information.