paypal / dce-go

Docker Compose Executor to launch pod of docker containers in Apache Mesos.
Apache License 2.0
63 stars 48 forks source link

New Plugin Interface for custom hooks around Executor API methods #75

Open mcmadhan01 opened 5 years ago

mcmadhan01 commented 5 years ago

Background

DCE's current plugin mechanism is pod lifecycle centric, and allows custom extensions to be plugged in around the steps needed to launch a pod (i.e. custom plugins can be added pre/post of image pull, compose up steps)

Requirement

There are situations that demand custom logic executions outside of pod lifecycle, and aligned with the executor's API functions. For example, we have a requirement to execute some custom logic to post error related metrics whenever the the control exits LaunchTask implementation of dce-go.

This may be applicable for other executor API methods too but scope of this issue could deal with LaunchTask and allow extensibility for all API methods.

Proposed Design

Executor Hook

...
execHooks:
   LaunchTask:
      Post: ["hook1", "hook2"]
...

So, on exit from LaunchTask, most definite thing that is done in the current DCE is to send status to mesos. So to perform the post executions, we can introduce a task status channel and have the hooks executed based on various status changes.

mbdas commented 5 years ago

Why it cannot be accommodated around task/pod lifecycle? Executor is nothing but managing the lifecycle of a task with launch and kill being the primary lifecycle operations. Dce today supports prelaunch and postKill hooks and so postLaunch can be added in same fashion.

mcmadhan01 commented 5 years ago

Why it cannot be accommodated around task/pod lifecycle? Executor is nothing but managing the lifecycle of a task with launch and kill being the primary lifecycle operations. Dce today supports prelaunch and postKill hooks and so postLaunch can be added in same fashion.

There is a postlaunchtask plugin method already available - https://github.com/paypal/dce-go/blob/develop/plugin/type.go#L28 but this is not suitable for some needs where you want a generic hook to execute no matter what. For example, I want to post some metrics (error, time taken etc) when the control exits LaunchTask but with current implementation, any failure in steps before pod launch (failures may be from image pull, preimage/postimage plugins etc) fail the task early, and never get to execute postlaunchtask, which is executed after launch of the pod in the current impl.

Also I have been thinking a mechanism to have hooks on task status transition (ie. hooks executed on task transition from starting -> running or failed, running -> failed) may make sense. Hope to hear some more point of views on this.

mbdas commented 5 years ago

Why not change the current implementation ? Having 2 sets of hooks will become confusing where to add what. I will go over the PR to see what can be consolidated. But most important thing, mesos agent expects the call back methods to finish fast because it cannot invoke any other method if blocked on existing call. So we have to ensure launchTask returns and the hooks execute async.

Regarding task transition , that happens on the mesos master. Agent may help send some task status through executor, some are embedded inside agent, some may be like lost added by master itself.

mcmadhan01 commented 5 years ago

Why not change the current implementation ? Having 2 sets of hooks will become confusing where to add what.

Original idea was to not impact the pod life cycle but allow something like pre/post of any executor API. It will break backward incompatibility for anyone with custom plugin implementations. Also the requirements are such that we want to execute these hooks no matter what happens on the pod life cycle steps. If two sets of hooks around task/pod may lead to confusion, then I think that the best approach to allow a custom hook post ExecutorAPIs would be to listen on task transitions and react. More on that below.

Regarding task transition , that happens on the mesos master. Agent may help send some task status through executor, some are embedded inside agent, some may be like lost added by master itself.

By task status transition, I meant the status update detected by DCE/Custom Plugins - to mark a pod failed, running, which in turn translates to a task status update to Mesos. This is where we can achieve a hook mechanism. Please share your thoughts.

But most important thing, mesos agent expects the call back methods to finish fast because it cannot invoke any other method if blocked on existing call. So we have to ensure launchTask returns and the hooks execute async.

Yes, it is not a blocking execution in the current PR #76 impl. It is a goroutine that listens on task status but tied just to LaunchTask. With my proposal above, this can be made generic to all Executor API methods.

mcmadhan01 commented 5 years ago

Summarizing the offline discussions with @mbdas