mesos / mesos-go

Go language bindings for Apache Mesos
Apache License 2.0
545 stars 146 forks source link

executor shutdown callback couldn't complete properly #311

Open JiaminZhu opened 7 years ago

JiaminZhu commented 7 years ago

driver.stop() in shutdown cause executor exit immediately without waiting for executor shutdown callback complete. https://github.com/mesos/mesos-go/blob/master/api/v0/executor/executor.go#L484-L501

janisz commented 7 years ago

I'll put some context here:

Looking at the go driver code, looks like the driver does a stop() when it gets a shutdown message from the agent. This will likely cause the driver and the executor to exit immediately. This is different https://github.com/apache/mesos/blob/master/src/exec/exec.cpp#L428 from how the C++ driver responds to the shutdown message.

https://lists.apache.org/thread.html/e2d87b2e988a026cb497c3aabfc7b2c7d5632d8fdb5b09b503661f50@%3Cdev.mesos.apache.org%3E

JiaminZhu commented 7 years ago

Thanks @janisz

jdef commented 7 years ago

it sounds like you're looking for semantics more like this?

...
  t := time.NewTimer(shutdownGracePeriod)
  ch := make(chan struct{})
  driver.withExecutor(func(e Executor) {
    defer close(ch)
    e.Shutdown(driver)
  })
  // only stop the driver once Shutdown callback has completed
  go func() {
    defer t.Stop()
    select {
      case <-t.C:
      case <-ch:
    }
    // driver.stop() will cause process to eventually stop.
    driver.stop()
  }()
JiaminZhu commented 7 years ago

Thanks @jdef I'm thinking that we can keep driver.stop() in Shutdown callback, so that we won't have this issue. And you are right, we do need to keep a timer for shutdown grace period. Isn't executor_shutdown_grace_period one of mesos agent config? Why we need it here?