Support for a ZooKeeper Master Detector

tarnfeld commented 10 years ago

Just getting to grips with things, but I assume it's just a case of implementing one of those in pesos.detector ?

wickman commented 9 years ago

Implemented at a835b126a.

tarnfeld commented 9 years ago

Just giving this a go now on a staging cluster, actually. I'll close it if it seems to work fine.

tarnfeld commented 9 years ago

In general it seems to work OK (the zookeeper group aspect) but I think my tested around ZK also falls down with #15. In the event of an identical appointment, presumably the code that continues to re-connect to the known master should kick in? I think that bit is broken.

2015-03-28 03:21:01,841[pesos.detector] FutureMasterDetector.detect no-op because previous same as leader: None
2015-03-28 03:21:01,843[pesos.detector] FutureMasterDetector.appoint accepting appointment master@192.168.33.2:5050
2015-03-28 03:21:01,843[pesos.scheduler] New master detected: master@192.168.33.2:5050
2015-03-28 03:21:01,843[pesos.scheduler] Registering framework: framework {
  user: "tom"
  name: "xxx"
  hostname: "1.0.0.127.in-addr.arpa"
}

2015-03-28 03:21:01,844[pesos.scheduler] Setting transition watch from previous master: master@192.168.33.2:5050
2015-03-28 03:21:01,844[pesos.detector] FutureMasterDetector.detect no-op because previous same as leader: master@192.168.33.2:5050
2015-03-28 03:21:01,919[x.scheduler] Framework 20150328-031924-35760320-5050-1308-0000 registered to http://vagrant-ubuntu-trusty-64:5050
2015-03-28 03:21:01,961[x.scheduler] Handling 1 offers
2015-03-28 03:21:03,844[pesos.scheduler] Skipping registration because we are either connected or there is no appointed master.
2015-03-28 03:21:07,354[x.scheduler] Handling 1 offers
2015-03-28 03:21:13,358[x.scheduler] Handling 1 offers
2015-03-28 03:21:19,362[x.scheduler] Handling 1 offers
2015-03-28 03:21:23,611[compactor.context] Received disconnection from master@192.168.33.2:5050 but no stream found.
2015-03-28 03:21:30,659[pesos.detector] FutureMasterDetector.appoint skipping identical appointment master@192.168.33.2:5050
2015-03-28 03:21:34,061[pesos.detector] FutureMasterDetector.appoint skipping identical appointment master@192.168.33.2:5050

wickman commented 9 years ago

thanks for the report. I'll take a closer look.

tarnfeld commented 9 years ago

Simply removing the check from here seems to do the trick, but I don't think that's the real solution.

Edit: I also added the following method to the scheduler;

def exited(self, pid):
  if pid == self.master:
    log.info('Disconnected from current master: %s' % pid)
    self.context.delay(self.MASTER_DETECTION_RETRY_SECONDS, self.pid, 'detect')

wickman / pesos

Support for a ZooKeeper Master Detector #1