Closed whummer closed 8 years ago
@whummer interesting – I think if it's never null then there's a bug, and that should be fixed.
IIRC the reason I added this special check is because there are (or at least were) scenarios where, for a brief period after the stream is closed (ie, it has an EndingSequenceNumber
), Kinesis would still continue to pass back non-null NextShardIterator
s for a short while (I believe until the time encoded in the iterator was past the final sequence number) – and I wanted to make sure I replicated this functionality. It may be that they've fixed that bug, or that the scenarios are more complicated than I thought (highly likely).
I'm not going to merge as is (no point having a true ||
check) – I'll either get rid of the check entirely, or will see if I can figure out what was causing that behaviour. Will get back to you soon, thanks!
Yeah, I've found the parts of my tests where I would check for that, based on testing against live Kinesis instances:
https://github.com/mhart/kinesalite/blob/master/test/splitShard.js#L344-L351
Kinesis will continue to put records in the "closed" shard until the time has clocked over to the new shard's create time
https://github.com/mhart/kinesalite/blob/master/test/splitShard.js#L381-L385
Sometimes returns with a NextShardIterator, sometimes doesn't
https://github.com/mhart/kinesalite/blob/master/test/splitShard.js#L417-L418
Can continue to return a number of times with a NextShardIterator after shard is closed
I'll have to see if this is still the case.
@whummer also, is there a short snippet of code that you can show that can reproduce this? Even if it's just pseudocode that shows the API calls being made and the various options being passed (like ShardIteratorType
)
Because as far as I can tell, it should start to return an empty NextShardIterator
at some point – otherwise this test would never pass:
https://github.com/mhart/kinesalite/blob/master/test/splitShard.js#L396-L427
Thanks for your quick response. Of course, the PR in its current form is merely for illustration.
I believe that the test case in https://github.com/mhart/kinesalite/blob/master/test/splitShard.js#L396-L427 hits a different part of the code because this one continuously calls GetShardIterator
(hits actions/getShardIterator.js
), whereas KCL continuously calls GetRecords
(hits actions/getRecords.js
) and uses the NextShardIterator
contained in the response.
I'll try to put together a self-contained minimal working example, hopefully in the next couple of days. Thanks
Ah, yeah, of course – good to know on the GetRecords
vs GetShardIterator
distinction
Ah, I think I've spotted the issue...
nextSeq
will always be the same thing, because lastItem
doesn't exist:
var nextSeq = lastItem ? db.incrementSequence(lastItem._seqObj) : seqNo
So the generated NextShardIterator
from createShardIterator
on the next line will always have the same embedded sequence number – and seqObj
is parsed out of this sequence number which means seqObj.seqTime >= endSeqObj.seqTime
will never change.
At least... I think that may be the issue.
@whummer I think I've fixed this – can you checkout the increment-seq-correctly
branch and let me know if that works for you?
@mhart yep that branch works. I'll let you know if we run into any other issues, but for now all looks great. Thanks for fixing this so quickly!
Great, thanks for trying that @whummer – released as v1.11.5
First of all, thanks for publishing this project @mhart - kinesalite is really awesome! :)
This PR is about a minor compatibility issue with Kinesis Client Library (KCL) [1]. In general, the combination of kinesalite and dynalite works really well for local integration testing with KCL.
We have one test case, however, where KCL works properly with AWS Kinesis but fails to work with kinesalite. The test consists of 1 stream, 1 producer which constantly pushes records to the stream, and 1 consumer (KCL process) which continuously reads records. In parallel, we slowly increase the shard allocation by running a sequence of
SplitShard
API calls.By constantly polling the stream description, the KCL figures out that new shards are added. The handover from the "old" shard (parent) to the "new" shards (children) happens as soon as the KCL determines that the end of the old shard has been reached. The responsible piece of code can be found here [2], essentially KCL checks whether
NextShardIterator
isnull
, then setsisShardEndReached
totrue
and starts initializing the new child shards from that point onwards. The problem is that the conditionnextIterator == null
is never satisfied with kinesalite, at least not in our test setup (the issue is reproducible).The change in this PR fixes this issue and allows KCL to reliably switch to the new shards after a split operation. See also the comment in the code which explains why the change is necessary.
It is hard to tell at this point which implications this change has for other bits of the framework, but our test results indicate that this actually best mimics the behavior of Kinesis. Judging from the unit test results of this project, this is a non-breaking change as all test cases are still succeeding.
Please consider accepting this change as it covers a crucial functionality for a majority of users (KCL is probably the most widely used tool to interact with Kinesis). If you figure out a better solution, I'll be happy to test it against our KCL testbed. Thanks
[1] https://github.com/awslabs/amazon-kinesis-client [2] https://github.com/awslabs/amazon-kinesis-client/blob/c6e393c13ec348f77b8b08082ba56823776ee48a/src/main/java/com/amazonaws/services/kinesis/clientlibrary/lib/worker/KinesisDataFetcher.java#L73