The idea here is:
When writing data to ES, we also store the Kinesis shard checkpoints alongside the data
These checkpoints will be backed up alongside the event data each night
In the case we need to do a restore, we will copy the checkpoints from ES back to DynamoDB before restarting the ES SInk
Doing this should mean we can recover our ES and restart drip feeding without data loss/duplication.
Open questions: how transactional is the ES backup - is there a risk of drift between data loaded and checkpoints stored during the S3 backup?
Note: this idea is borrowed from the Kafka guys, who suggest co-locating checkpoints alongside data in a storage target
Even better would be if we could move the master copies of the checkpoints to Elasticsearch, but this would be more difficult for our internal monitoring and not supported by the KCL...
from snowplow/snowplow#2456: