uber / ringpop-go

Scalable, fault-tolerant application-layer sharding for Go applications
http://www.uber.com
MIT License
835 stars 83 forks source link

Bootstrap can occur without listen #146

Closed benfleis closed 8 years ago

benfleis commented 8 years ago

(More or less duplicate of https://github.com/uber/ringpop-node/issues/275)

Ringpop currently allows bootstrap to occur without a listening tchannel underneath. This was confirmed by using tick-cluster with bootstrap but not listen on both a single node, and all nodes (patch below).

Behaviorally, a single node failing to listen is the worst case. It has continuous 1-way interactions with other nodes, and seems to create a continuous cycle of other nodes marking it suspect. This is possible in real life during a rolling upgrade, or if bootstrap/listen handling is incorrect in some cases.

If all nodes fail to listen, they all simply fail to bootstrap, as expected.

Our current code demonstrates listen() before bootstrap() pretty consistently, but given the failure modes, we ought to be more defensive and confirm that the tchannel is already listening, or call r.channel.listen()ourselves.

Behavior was confirmed by watching tick-cluster logs, and running ringpop-admin dump on one of the live nodes.


diff --git a/scripts/testpop/testpop.go b/scripts/testpop/testpop.go
index a239893..d6a3d1b 100644
--- a/scripts/testpop/testpop.go
+++ b/scripts/testpop/testpop.go
@@ -66,16 +66,26 @@ func main() {
        ringpop.TombstonePeriod(5*time.Second),
    )

-   if err := ch.ListenAndServe(*hostport); err != nil {
-       log.Fatalf("could not listen on %s: %v", *hostport, err)
-   }
+   if *hostport == "172.18.24.59:3000" {
+       opts := &swim.BootstrapOptions{}
+       opts.DiscoverProvider = jsonfile.New(*hostfile)

-   opts := &swim.BootstrapOptions{}
-   opts.DiscoverProvider = jsonfile.New(*hostfile)
+       _, err = rp.Bootstrap(opts)
+       if err != nil {
+           log.Fatalf("bootstrap failed: %v", err)
+       }
+   } else {
+       if err := ch.ListenAndServe(*hostport); err != nil {
+           log.Fatalf("could not listen on %s: %v", *hostport, err)
+       }

-   _, err = rp.Bootstrap(opts)
-   if err != nil {
-       log.Fatalf("bootstrap failed: %v", err)
+       opts := &swim.BootstrapOptions{}
+       opts.DiscoverProvider = jsonfile.New(*hostfile)
+
+       _, err = rp.Bootstrap(opts)
+       if err != nil {
+           log.Fatalf("bootstrap failed: %v", err)
+       }
    }

    // block