pandastrike / huxley

API and CLI for Docker Deployment
9 stars 1 forks source link

Formalize the Cluster's Kick Server API #24

Closed freeformflow closed 9 years ago

freeformflow commented 9 years ago

The code for the cluster's kick server is stored in panda-kick. It works, but it's fairly rough. We should formalize it and use PBX to setup a proper API. We should create a JSON schema for the kick server's inputs and set up the kick server to begin validating requests it is sent.

freeformflow commented 9 years ago

@PandaWhisperer, I'm assigning this to you for iteration Alpha-02. Huxley stands to gain from a stabilized and reliable kick server, and you have experience with PBX. So, this is a good match for you.

Copied From panda-kick Ticket 2

@PandaWhisperer, I agree that panda-kick is rough. Since it's passed the proof-of-concept test from Dan, let's go ahead and create a formalized API server using PBX. It should only take a couple days, and we want to make sure this is reliable since it underpins our ability to deploy other services.

Since panda-kick is part of the "wiring up" that Huxley clusters perform, and you've been dealing with that for the ELK stack mixins, I think this work dove-tails nicely. This is Ticket #24 on Huxley. Let's make it part of your Iteration Alpha-02 work, please. I'll copy this text there.

Also, please let me know if you'd like me to walk you through any code or intentions.

PandaWhisperer commented 9 years ago

@PandaPup I do have a question about the kick server. Is there a reason why the configuration file is read several times on each request? Is that intentional? I.e. is there ever a situation where we want to change the configuration without having to restart the kick server?

freeformflow commented 9 years ago

@PandaWhisperer No, at least not right now. My main concern is to be cautious because the kick server holds AWS credentials. At the time, I thought it would be better if those were not held in active memory, but that idea did not pan out very well, haha.

The main protection for the kick server is the AWS Security Group. The kick server is accessible on port 2000, which is exposed to only other addresses on the subnet. This should be sufficient for now, so we can abandon my earlier misguided plan to keep the config out of active memory.

To the question of configuration, we want to work toward making the kick server pretty savvy and independent, eventually. Because the kick server has AWS access, theoretically it could lookup what it needs from Amazon. For example, when a service asks for a hostname under a new domain, the kick server could find out that hosted zone ID. But that's a future wish. For now, we can focus on stability.

PandaWhisperer commented 9 years ago

Well, the credentials are already stored unencrypted on disk. Plus, once they're read into memory, even if it's just into a local variable, you have no control over how long they remain there, because of the garbage collector. Oh and finally, you assign them to a persistent object (AWS.config = configure_aws()), so they actually stay in memory unencrypted the entire lifetime of the server.

Since the kick server is the only thing accessible on the machine, the only insurance we have is that it's not vulnerable to a buffer overflow attack? Hopefully JavaScript keeps its strings in check.

PandaWhisperer commented 9 years ago

Been working on this and I have a couple questions / thoughts on using PBX:

  1. For PBX's autodiscovery to work, it needs to know its public server address (i.e. the address within the cluster). We need to somehow inject that into the server. This would have to happen when the kick server is first set up.
  2. PBX is very strict about checking mime types. If we use PBX, we can't just POST JSON to the server with no mime type, it will be rejected unless the mime type is exactly what PBX expects – i.e. the custom mime type it generates. application/json is not enough.
  3. Lastly, the current server allows DELETEing a record identified by information in the request body. This is non-standard and therefore doesn't work with PBX. In order to be able to delete records using PBX, we'll have to create unique identifiers from them, and expect the client to send them back to us. Since "the client" = a .service file, I'm not sure how that would work. Also, when would deleting every occur? Currently non of the .service files I've seen do that.

@PandaPup @dyoder Please let me know your input on this.

freeformflow commented 9 years ago

@PandaWhisperer, okay, cool.

  1. The kick server is always placed at kick.[cluster_name].cluster:2000 within the cluster. panda-cluster sets all that up. We could pick another standardized location, if that makes sense, but I think we should strive to have conventions for addressing of kick and hook agents.
  2. How complex are these mime types? We can template simple things in the *.service files, but if it's a huge block of text, we should place a shell script on every CoreOS machine to keep the service files clean. In the worst-case scenario, we could even have PBX clients running on every machine, but let's avoid that complexity if we can. Could we accommodate PBX's requirements within a few lines in the .service file?
  3. Let's just ignore DELETE for now. As long as we allow overwrites to DNS records that already exist, that will be sufficient for user needs. Making DELETE function properly is nice, but not critical, so it can be a problem for "Future Us". haha.
PandaWhisperer commented 9 years ago

@PandaPup

  1. Right. I know the address is somewhat predictable, but we'd still need the cluster name to compute it. In other words, the kick server would have to somehow know its cluster name.
  2. The custom mime types are derived from application/json by inserting the vnd (signifying a vendor-specific extension, I believe) prefix and the name of the application and the resource being described: application/vnd.#{app_name}.#{resource_name}+json. This can be achieved with curl by adding an appropriate -H 'Content-Type: ...' header. A custom client would be sleeker, and only require a few lines of code thanks to autodiscovery, but it would have to be added to the Docker images (I believe).
  3. Sounds good. OTOH, PBX's semantics would require an ID also to make changes to an existing record. Currently, we just POST some info to the kick server, and the server decides whether a record is updated or created. Proper REST semantics, which PBX enforces, don't allow that. What we're doing here is using POST in the other sense of the spec, where it's basically "it can do anything it wants". This is not wrong (it's covered in the spec), it's just not what PBX/REST does. At least that is my understanding.
PandaWhisperer commented 9 years ago

Well, I guess we could do whatever we want in the create handler, and forgo all the others, including PUT. But then there isn't really much left that we'd need PBX for, so a plain server would (or using connect) would be preferable.

freeformflow commented 9 years ago

@PandaWhisperer,

  1. That's fine. panda-cluster has access to the cluster name, so we can input that into the kick server's Docker container as it gets activated. We could also use etcd to store values and small amounts of data like that.
  2. Hmm... Let's see if we can fake it in the service file. We don't wan to ask developers to modify what goes into their services. We can do wiring and configuration outside of the containers, but we should be un-opinionated about what happens inside. Let's see if we can add a mime type that would be standard across all services.
  3. Okay, I can see how PBX is getting in the way. We're really just after stability, reliablity, and robust error handling. Since PandaStrike prides itself on API chops, I was hoping we could get some mileage and boilerplate out of the way with PBX. We also want to be future-minded. Since the kick server is likely to take on additional services for users, it would be nice if adding functionality is as easy as adding a definition and a handler.
freeformflow commented 9 years ago

Also, if we need to, we can eventually start holding resource IDs in etcd. That would be cool, actually. So, we don't need to be afraid of being RESTful. We want to work with PBX to make our lives easier.

PandaWhisperer commented 9 years ago

The rewrite should address panda-kick:#4, panda-kick:#5.

PandaWhisperer commented 9 years ago

Closing because https://github.com/pandastrike/panda-kick/pull/6 has been merged.