Closed baruch closed 10 years ago
The issue I'm trying to fix here is the possibility for data loss and consistency loss if the power is lost in mid-operation or after the operation but before the filesystem flushed the data to the disk. The filesystem may delay the actual write by 5 (ext3) to 30 (xfs) seconds and there is a real chance for losing consistency in that case.
This will however slow performance.
Another thing found is that there may be a resource leak if there is a write error when switching files, the temporary file is not closed nor deleted.
@baruch fsnyc
on snapshot is ok. But it will slow down the speed when applied on the log entry. When a log entry is committed, it should be persistent on the page cache of the majority of nodes. The only problem is that if all of theses machines die the committed log entries will be lost.
Actually I have not began to measure performance and its affect, so I cannot make a decision.
I will review it all later, but want to give a fast response to thank you.
I generally agree but there are some fine points that need to be thought about, especially the change of the flushCommitIndex.
Feel free to take only the parts you think are really necessary and ignore those that you're not sure about.
@baruch We are fixing the issue in another pr #150
The below are based on reading and writing the code, no strong testing for all of these conditions was done to show a real problem or that the fix necessarily works so review critically.