Closed benbjohnson closed 11 years ago
@benbjohnson One style thing I have started doing in markdown files is one sentence per line. It is compatible with all of the editors and makes reviewing in git and github easier. Just a suggestion; no strong feelings either way.
lgtm, super helpful.
@philips I prefer that too actually. I started using the Zen Mode for editing right on GitHub and it's been nice but it does one long line. Also, I read that GFM adds line breaks for multiple lines but I guess .md
is just straight markdown (ref).
"we've found the maximum effective cluster size to be around 9 nodes. We typically suggest a 5 node cluster for performance reasons though."
Might be worthwhile giving some context as to what the setup configuration was. Are these 9 and 5 node clusters on the same LAN or across WAN? What was the ping like between them?
I know there are a lot of variables at play but I think its good to give some idea of the environment the suggestion is coming from.
On Fri, Oct 25, 2013 at 12:51 PM, Ben Johnson notifications@github.comwrote:
Per an e-mail conversation with @pvo https://github.com/pvo, I added a section to the documentation called Raft in Practice.
@pvo https://github.com/pvo Can you tell me if this helps to answer your question? Is there anything I'm missing?
@ongardie https://github.com/ongardie @kellabytehttps://github.com/kellabyte @xiangli-cmu https://github.com/xiangli-cmu @philipshttps://github.com/philipsCan you guys give me a technical review? It's not very long.
Here's the pretty printed versionhttps://github.com/goraft/raft/blob/docs/README.md#raft-in-practice
.
You can merge this Pull Request by running
git pull https://github.com/goraft/raft docs
Or view, comment on, or merge it at:
https://github.com/goraft/raft/pull/124 Commit Summary
- Add 'Raft in Practice' to README.
File Changes
Patch Links:
@kellabyte Good point. The overhead is more related to message processing time and not necessarily latency. I'll make it more descriptive.
@benbjohnson Another thing to consider is even vs odd number of nodes. e.g. https://github.com/coreos/etcd/issues/149#issuecomment-23603009
Looks pretty good, but in cluster size, I don't think heartbeat chattiness is the biggest concern. Even if your heartbeats were 500 bytes, you had 101 servers, and your heartbeats went out every 50ms, that's still less than 1% of a gigabit link out from the leader.
I think the main reason to make a cluster larger is to tolerate more server failures before a human has to get involved (your Concurrent Node Failures). For this reason, I can't imagine needing a cluster bigger than 9 servers, allowing 4 of them to fail independently before any are replaced.
I think the main reasons to keep a cluster small are (a) cost, (b) latency/bandwidth for new entries coming in to the leader, since the leader has to replicate each one out to a majority of the cluster, and (c) if you really do run with a ton of servers, you're more likely that candidates will interfere with each other during elections, so you might need to increase your election timeouts (baseline and range).
Thanks all. Super helpful.
@ongardie Good point. I updated the README to be mainly around node failure tolerance. I wanted to keep it simple so I left out cost, latency, & election conflicts. Let me know what you think.
shipit
@ongardie For the cluster size, in my reply I was comparing 8 and 9 rather than 5 and 9. Also, if having a lot of nodes, we probably need to cache the serialization result, which we have not done yet.
Per an e-mail conversation with @pvo, I added a section to the documentation called Raft in Practice.
@pvo Can you tell me if this helps to answer your question? Is there anything I'm missing?
@ongardie @kellabyte @xiangli-cmu @philips Can you guys give me a technical review? It's not very long.
Here's the pretty printed version.