Closed xiang90 closed 10 years ago
@xiangli-cmu I don't think I quite understand the PR comment. Do you want to remove the Sync()
on the log when the leader is committing a command?
@benbjohnson
util.go
and using rename
to replace the log is cherry-picked from @baruch's pull request.@xiangli-cmu Technically I think it's supposed to fsync before it sends to the followers. Although you might be right. If the entry hasn't been committed by the leader then it's not externally visible so it might be ok if it gets lost. It'll be committed by the next leader.
You still need to fsync the followers after each append so this might just end up making the followers a lot slower.
I'm ok with adding configurable options for fsync. Some people may want to trade off a little safety for a hefty performance boost. It should be strict by default though.
@benbjohnson It should be safe since we only need to make sure at the committing point, the log entries are safely stored on the majority of the nodes in the cluster.
The thing is that
My hope is to do heartbeat in a single go-routine , and broadcast in multiple go-routines. In that way, we can
@benbjohnson +1 on making fsync something we can disable by default.
Perhaps I am being naive but fsync only helps protect against data loss in a full cluster down, right? In the normal case where the current leader has a power failure but a majority of the cluster stays up then it doesn't matter if goraft is fsync'ing because the cluster will have moved on and the correctness of the leader's log is irrelevant once it rejoins.
Just trying to wrap my head around why we are fsyncing the log at all.
I haven't thought much about this, but I think @xiangli-cmu is right that until the leader externalizes entries (by telling followers or clients that they're committed), it's not required to retain them. As far as Raft is concerned, it's ok to lose uncommitted entries.
OTOH, I think @philips's comment takes it too far -- I don't think you can get rid of all fsyncs safely. If an entry /is/ committed, then you have to maintain the invariant that a majority of the servers stores the entry. Otherwise, there's no way to guarantee that they'll be available when it comes to electing a new leader.
@philips
As @ongardie suggests, if we disable fsync
, then we are totally rely on the page cache of OS
. Both power failure on leader and the the whole cluster might cause us to lose committed data.
@xiangli-cmu lgtm. Can you clean up the writeFileSynced()
and go ahead and merge.
@benbjohnson Will do.
@ongardie For our raft implementation, we do fsync every time the follower append log entries from followers. But we do not do fsync every time the leader receive a command from the clients. I observer that leader does not need to do a fsync before actual committing. This is because the leader will decide when to commit the log entry. Thus the leader can safely assume that itself has the log entry no matter the log has been fsynced or not. Fsync every time the leader receiving a command will affect performance largely. Do you think it is a right observation?
/cc @baruch @benbjohnson