goraft / raft

UNMAINTAINED: A Go implementation of the Raft distributed consensus protocol.
MIT License
2.43k stars 480 forks source link

Sync fixes #137

Closed baruch closed 10 years ago

baruch commented 10 years ago

The below are based on reading and writing the code, no strong testing for all of these conditions was done to show a real problem or that the fix necessarily works so review critically.

baruch commented 10 years ago

The issue I'm trying to fix here is the possibility for data loss and consistency loss if the power is lost in mid-operation or after the operation but before the filesystem flushed the data to the disk. The filesystem may delay the actual write by 5 (ext3) to 30 (xfs) seconds and there is a real chance for losing consistency in that case.

This will however slow performance.

Another thing found is that there may be a resource leak if there is a write error when switching files, the temporary file is not closed nor deleted.

xiang90 commented 10 years ago

@baruch fsnyc on snapshot is ok. But it will slow down the speed when applied on the log entry. When a log entry is committed, it should be persistent on the page cache of the majority of nodes. The only problem is that if all of theses machines die the committed log entries will be lost. Actually I have not began to measure performance and its affect, so I cannot make a decision.

I will review it all later, but want to give a fast response to thank you.

baruch commented 10 years ago

I generally agree but there are some fine points that need to be thought about, especially the change of the flushCommitIndex.

Feel free to take only the parts you think are really necessary and ignore those that you're not sure about.

xiang90 commented 10 years ago

@baruch We are fixing the issue in another pr #150