Implement a Kick Server Endpoint to Collect Service Status Events

freeformflow commented 9 years ago

@PandaWhisperer, please implement an endpoint in the kick server's API that accepts status events from services. The schema should follow the one specified by Dan in this page in our wiki.

It should include service_name, the state of the event, and a details field which accepts an arbitrary string.

Don't forget that that fleetctl provides the directive ExecStop which we can use to announce shutdowns.

PandaWhisperer commented 9 years ago

For completeness's sake, I'm just going to leave this part of our discussion here:

We discussed how the kick server will know which API server to forward the status to.

@PandaPup suggest we add an entry to kick.cson, which is set accordingly during cluster creation by panda-cluster. The kick server then uses that to communicate with the API server.

I suggested using SRV records for this purpose, since we already have a facility to set them easily (the kick server). Instead of adding the setting to the kick server's configuration file, panda-cluster would use the newly created kick server to add a SRV record to the cluster's private zone, such as

_huxley._tcp_.<name>.cluster. 86400 IN SRV 1 1 8080 huxley.pandastrike.com

The kick server can then look up this record when it needs to talk to the API server. The advantage is that in order to modify it, we just need to update a record in DNS, which can even be done using the kick server itself.

Also, this would work seamlessly once we transition to using Redis queues instead of direct HTTP connections. It even supports high-availability scenarios, because failover servers can be added with different priorities.

The only downside I can see is that without any layer of protection, a malicious user trying to break into a cluster could "hijack" a kick server by simply pointing the SRV record to another server. I don't understand Huxley well enough just yet to predict the consequences of that, but I just wanted to put it out there.

Anyways, @PandaPup decided against this option for now, so I will be adding a configuration file setting.

PandaWhisperer commented 9 years ago

@PandaPup as discussed yesterday, the kick server will mostly just pass along the data sent by the .service files (via curl). The only things it will add is the cluster name and a timestamp.

How will the kick server know the name of the cluster? Is it safe to infer that from private_hosted_zone (by taking everything up to the first .)? Will the .cluster domain ever change? Should we use the full name of the private hosted zone (minus the last dot)?

freeformflow commented 9 years ago

That's probably a little too restrictive on users. Panda-Cluster knows the cluster name of clusters it builds. Let's have it place that information into the kick server's configuration file, please.

PandaWhisperer commented 9 years ago

@PandaPup are there any situations in which the cluster name would be different from the name of the private hosted zone (minus .cluster if applicable)?

freeformflow commented 9 years ago

There is no uniqueness requirement for private hosted zone names, so I am uncomfortable relying on them to extract the cluster's name.

Currently, Huxley automatically creates a private hosted zone under [cluster-name].cluster. That's not configurable currently, so if you want to extract cluster names from the private zone for this iteration, that's fine.

But, in the future we should probably be willing to let the user set whatever name they want.

PandaWhisperer commented 9 years ago

Thanks @PandaPup. Since it's currently not configurable, I'll just go with extracting the cluster name from the private hosted zone.

If this should change in the future, it will be easy to add a configuration setting.

pandastrike / huxley

Implement a Kick Server Endpoint to Collect Service Status Events #77