nearmap / kcd

Continuous Delivery for Kubernetes
MIT License
72 stars 7 forks source link

Improving deployment monitoring and auditing. #49

Open taoruicui opened 5 years ago

taoruicui commented 5 years ago

Hello KCD team,

We really like this tool and try to build our own deployment tool on top of it.

Considering that we have hundreds of prod machines with slow starting apps, we realized that current deployment history tracking in configmaps won't be able to meet our needs. We are seeking a way to give us more visibility into the deployment process. Such as getting some specific data like when deployment started, number of pods updated, number of pods to be updated, number of pods failed in update, when deployment finished, if deployment failed or succeeded etc. These kind of data would largely help the monitor of the huge deployment process. Might be even useful if we have an event-based model that sends out these data as events with given webhooks.

Therefore, we reached out to you to see if you have any plan of implementing similar features? If not, we'll probably build our own piece.

Thank you.

simoncochrane commented 5 years ago

Hey @taoruicui. Thanks for the feedback and we are glad to hear that you like kcd and looking at improvements to the tool.

It would be easy enough to create Kubernetes events on the KCD resource that you can then hook into your monitoring system. Would this suffice? If so we would be very welcoming of PRs if you are keen to add this this. We envisage kcd to evolve with the community so certainly welcome pull requests and feedback.

There is also a dashboard available that shows the current status of deployment that is served by the kcd controller. You essentially need to set up an ingress and expose the endpoint at /kcd/v1/namespaces/{namespace}/resources?reload={true|false}&format={html|json}&access_token={token} where {token} is a generated Kubernetes secret for a service account. Note that any valid service account is capable of accessing this dashboard.

Finally note that we do plan to add new features to kcd to deal with larger scale deployments, although I can't commit to any schedules at this time.

Hope this helps, Simon.

svrana commented 5 years ago

The dashboard is quite nice for the single cluster use case. In the case in which applications are deployed across multiple clusters (a cluster per region, for example) the single-cluster dashboard doesn't work well-- to get a cluster-wide view of an application deployment you'd have to visit multiple dashboards. What we'd like is to be able to aggregate all of the deployment information across clusters so only one dashboard is needed.

@taoruicui mentioned many of the events needed, but basically any action taken by KCD and the status of that action on the cluster and all of the metadata associated with it.

Events do seem promising. Likely we'd then add an application in each cluster to watch for these events and push them to an aggregator service. I am a bit worried that we might want to publish a bit more information in these events than you might like. As an example, we would like to know the number of pods that have been deployed with the new version and how many remain. While this information can be gathered in other ways we would then need to stitch the information together such that we can provide a consistent view of the deployment.

We've experimented with a DeploymentInformer but find it cumbersome. On the aggregator, we would like to tie all information reported about a deployment back to an original deploy event (i.e., the tagging of an image in ECR). This would be much easier if we had KCD's view instead of a DeploymentInformer. For example, KCD knows it is rolling back from version X to A; piecing this together from state outside of KCD is not so much fun.