pinterest / secor

Secor is a service implementing Kafka log persistence
Apache License 2.0
1.84k stars 543 forks source link

Consumer offset management #256

Closed robinmisfit closed 7 years ago

robinmisfit commented 7 years ago

Let's say we use local system(not HDFS) as storage and set the upload policy to "hourly", unfortunately we encountered a fatal problem and the program exited(which lead to all local files to be deleted, including those that haven't uploaded to S3). After we restarted secor, it will read msgs in Kafka from the offset that stored in Zookeeper(or Kakfa topics), but the offset wouldn't be the same with the point that we haven't got to upload to S3, So we'll lost some data.

Am I correct? If that so, how could we avoid this problem besides using HDFS?

robinmisfit commented 7 years ago

I think I've got the answer. Secor manages consumer offset manually by modifying zookeeper files after uploaded files to S3(not use the official Kafka consumer API to commit offset)