Let's say we use local system(not HDFS) as storage and set the upload policy to "hourly", unfortunately we encountered a fatal problem and the program exited(which lead to all local files to be deleted, including those that haven't uploaded to S3). After we restarted secor, it will read msgs in Kafka from the offset that stored in Zookeeper(or Kakfa topics), but the offset wouldn't be the same with the point that we haven't got to upload to S3, So we'll lost some data.
Am I correct? If that so, how could we avoid this problem besides using HDFS?
I think I've got the answer. Secor manages consumer offset manually by modifying zookeeper files after uploaded files to S3(not use the official Kafka consumer API to commit offset)
Let's say we use local system(not HDFS) as storage and set the upload policy to "hourly", unfortunately we encountered a fatal problem and the program exited(which lead to all local files to be deleted, including those that haven't uploaded to S3). After we restarted secor, it will read msgs in Kafka from the offset that stored in Zookeeper(or Kakfa topics), but the offset wouldn't be the same with the point that we haven't got to upload to S3, So we'll lost some data.
Am I correct? If that so, how could we avoid this problem besides using HDFS?