snowplow / snowplow-elasticsearch-loader

Writes Snowplow enriched events from Kinesis to Elasticsearch
http://snowplowanalytics.com/
11 stars 18 forks source link

Store Kinesis checkpoints in Elasticsearch #22

Open BenFradet opened 7 years ago

BenFradet commented 7 years ago

from snowplow/snowplow#2456:

The idea here is: When writing data to ES, we also store the Kinesis shard checkpoints alongside the data These checkpoints will be backed up alongside the event data each night In the case we need to do a restore, we will copy the checkpoints from ES back to DynamoDB before restarting the ES SInk Doing this should mean we can recover our ES and restart drip feeding without data loss/duplication. Open questions: how transactional is the ES backup - is there a risk of drift between data loaded and checkpoints stored during the S3 backup?

Note: this idea is borrowed from the Kafka guys, who suggest co-locating checkpoints alongside data in a storage target

alexanderdean commented 7 years ago

Even better would be if we could move the master copies of the checkpoints to Elasticsearch, but this would be more difficult for our internal monitoring and not supported by the KCL...