hibari / gdss-admin

http://hibari.github.com/hibari-doc/
Other
2 stars 0 forks source link

Investigate etcd as an alternative of bootstrap bricks and leader election algorithm #6

Open tatsuya6502 opened 10 years ago

tatsuya6502 commented 10 years ago

Investigate etcd to see if it can replace the following areas in admin server:

etcd is a highly-available key value store for shared configuration and service discovery. It is written in Go language and uses the Raft (a variant of Paxos) consensus algorithm to manage a highly-available replicated log. etcd is a part of CoreOS, a Linux distribution with extensive Docker support for distributed server platform.

Details about etcd: https://coreos.com/using-coreos/etcd/

tatsuya6502 commented 10 years ago

etcd - The Road to 1.0 April 14, 2014 · By Blake Mizerany http://coreos.com/blog/etcd-The-Road-to-1.0/

tatsuya6502 commented 9 years ago

On Jan 30, 2014, @tatsuya6502 wrote:

Investigate etcd to see if it can replace the following areas in admin server:

  • bootstrap bricks - admin server's private storage
  • leader election - admin server's active/standby management
  • brick auto-discovery - find a newly added brick server with zero-conf
  • brick scoreboard - keeps brick servers status

etcd is a highly-available key value store for shared configuration and service discovery. It is written in Go language and uses the Raft (a variant of Paxos) consensus algorithm to manage a highly-available replicated log. etcd is a part of CoreOS, a Linux distribution with extensive Docker support for distributed server platform.

I recently started to use etcd v0.4.6 cluster for other project to store application auto-discovery information, and now I'm very confident that etcd will be very good for above tasks in Hibari admin server. It will greatly simplify the admin server implementation and will be much more stable and scalable than the current implementation. I also think I can retire our partition-detector too by using etcd.

A couple of weeks ago, the CoreOS team announced the first release candidate for etcd v2.0.0, and it seems they have brushed up the Raft implementation.

Announcing the etcd 2.0 Release Candidate December 18, 2014 · By Xiang Li https://coreos.com/blog/etcd-2-0-release-candidate/

At the heart of this release is a rethink of our Raft implementation. Raft enables etcd to do safe distributed compare-and-swap operations on keys and perform automatic leader election in the face of host failures. As many engineers have observed, implementing a consensus algorithm like Raft is a non-trivial task. But the etcd team has been hard at work. By leveraging all the latest knowledge and best practices in the Raft ecosystem, we’ve developed the new etcd Raft in 600 lines of Go and 2000 lines of tests, covering all cases in the state machine and every case described in the paper.

tatsuya6502 commented 9 years ago

I can start to try etcd v2.0.0 RC using pre-built Docker container on CoreOS. Or, build it by myself for SmartOS Zones. I would prefer the latter as I'm running SmartOS on my home server.

etcd is written in Go, so all Go library dependencies are packed into a single binary file. Go's binaries are platform/processor-architecture dependent, but the compiler supports cross-compiling. It covers all possible Hibari platforms:

There is an official Docker container with cross-compiling support (the ones with *-cross tag), and I think Gox will be handy for this kind of task. This Dockerfile might be a good example of how to achieve this.

Of course, I can use Go packages in SmartOS, FreeBSD and Arch Linux ARM, but cross-compiling with Gox will be simpler as it can produce binaries for all these targets at once in parallel compiling.

tatsuya6502 commented 9 years ago

Too bad. No ARM (and other 32-bit systems) support in etcd.

coreos/etcd/README.md

32-bit systems etcd has known issues on 32-bit systems due to a bug in the Go runtime. See #358 for more information.

tatsuya6502 commented 9 years ago

Tried to build etcd 2.0.0 RC on 64-bit FreeBSD and SmartOS systems.

[root@etcd1 ~/etcd]# go version
go version go1.3.2 solaris/amd64
[root@etcd1 ~/etcd]# ./build 
# github.com/coreos/etcd/pkg/fileutil
gopath/src/github.com/coreos/etcd/pkg/fileutil/lock_unix.go:34: undefined: syscall.Flock
gopath/src/github.com/coreos/etcd/pkg/fileutil/lock_unix.go:34: undefined: syscall.LOCK_EX
gopath/src/github.com/coreos/etcd/pkg/fileutil/lock_unix.go:34: undefined: syscall.LOCK_NB
gopath/src/github.com/coreos/etcd/pkg/fileutil/lock_unix.go:43: undefined: syscall.Flock
gopath/src/github.com/coreos/etcd/pkg/fileutil/lock_unix.go:43: undefined: syscall.LOCK_EX
gopath/src/github.com/coreos/etcd/pkg/fileutil/lock_unix.go:48: undefined: syscall.Flock
gopath/src/github.com/coreos/etcd/pkg/fileutil/lock_unix.go:48: undefined: syscall.LOCK_UN

According to this article "Solaris portability - flock()" by a core SmartOS developer Jonathan Perkin, flock() is of BSD heritage and does not exist on Solaris. He recommends to use POSIX standard fcntl() instead. Go has syscall.FctlFlock, which is a file lock based on fcntl(), and I think etcd should use this for better portability in Unix-like operating systems.

I'll play with Go a bit more (I should start with basic tutorials), and try to send a PR to etcd project.

tatsuya6502 commented 9 years ago

etcd.erl - Erlang bindings for etcd key value store https://github.com/marshall-lee/etcd.erl

tatsuya6502 commented 9 years ago

OK. I'm not alone. Other people are making some progress on ARM support :+1:
#2308 etcdserver/ARM: starting etcd on crashes the first time, succeeds subsequently

Too bad. No ARM (and other 32-bit systems) support in etcd.

coreos/etcd/README.md

32-bit systems etcd has known issues on 32-bit systems due to a bug in the Go runtime. See #358 for more information.

tatsuya6502 commented 9 years ago

Tried to build etcd 2.0.0 RC on 64-bit FreeBSD and SmartOS systems. ...

  • SmartOS (guest zone, standard64 14.3.0), Go 1.3.2 from pkgin
    • ./build failed with the following errors

... flock() is of BSD heritage and does not exist on Solaris

There are some on-going efforts by other people to make etcd 2 (including bolt) to run on Solaris and illmuos.

akolb1 commented 9 years ago

@tatsuya6502 The above issues are fixed and now both bolt and etcd work on illumos and should work on Solaris.

tatsuya6502 commented 9 years ago

@akolb1 - That's great! I appreciate all your hard work.

The above issues are fixed and now both bolt and etcd work on illumos and should work on Solaris.