pandastrike / huxley

API and CLI for Docker Deployment
9 stars 1 forks source link

Kick server sometimes returns empty responses? #25

Closed PandaWhisperer closed 9 years ago

PandaWhisperer commented 9 years ago

When working on the "mint" example, I ran into a situation where after pushing my code to the hook server, the service wasn't restarted properly. Looking into the log files, I noticed the following:

Mar 04 22:47:50 ip-10-0-101-237.us-west-1.compute.internal curl[5300]: [1.4K blob data]
Mar 04 22:47:50 ip-10-0-101-237.us-west-1.compute.internal curl[5300]: curl: (52) Empty reply from server

It seems as if the DNS registration part failed. Currently the system has no way to recover from that.

freeformflow commented 9 years ago

@PandaWhisperer, to codify our plan I'm sticking it here. Please let me know if we need to hash out any more details, or I misunderstood what we discussed.

Step 1: Service pings the kick server, asking to create (or change) a DNS record. Step 2: The kick server makes the request to Amazon, creates a record for the DNS record in the kick server's database. The database entry stores the state of the DNS record and starts with the state "NOT READY". After creating this database entry, the kick server responds immediately to the service. Step 3: The service enters a loop where it pings the kick server, asking for the database entry is has stored about the state of the DNS record. Step 4: The kick server enters a loop where it pings Amazon for the state of the DNS record. Once the kick server detects a success or failure, the kick server updates its database entry. Step 5: The Service is still pinging the kick server and detects this change in state. The service exits its polling loop and then decides what to do with the report of success or failure.

For the Kick Server

We'll need to add a database and a new response handler that looks up the DNS state in the database and responds. This is a fair amount of work, but it provides a nice universal interface. The kick server will eventually have to set additional properties and work with multiple cloud-providers. You'll be laying a very solid foundation to make all of that much easier.

For Service Templates

We'll need to move from a single curl request to an initial curl request followed by polling. But this is comparatively easy.

freeformflow commented 9 years ago

For now, we can use Pirate's Memory option, which creates a database in the server's active memory. As we fill things out more, we can just drop in a real database. I just now realize how awesome that adapter ability is, haha.

PandaWhisperer commented 9 years ago

Okay, so basically, we're going to cache the status result from AWS and periodically update it until we get a result.

On the client, we'll do the same. I wonder if it's necessary to even poll the server in that case. We could just make a request each time the client asks for an update, that way we reduce complexity and only throttle on one side (i.e. the client).

freeformflow commented 9 years ago

What do you mean when you say the client and server? I just want to make sure I understand. Are you referring to a PBX client on the CoreOS machine?

My thinking was that a service makes requests to the kick server with nothing more than curl and bash loop. I'm worried that installing multiple PBX clients just gives us more components to worry about failing.

I would rather rely on curl and bash in the *.service file directly. That way, we only have to build fail-over infrastructure into the kick server and we isolate our risk to that component. Is there another reason we'd need to involve a client? If there is, I'd want Dan to weigh in on the appropriateness of skimping on formality and doing things with curl.

I just want to prioritize "toughness" and reliability of the cluster, because a lot of the questions at SCaLE focused on the fragility of micro-service deployments.

PandaWhisperer commented 9 years ago

Let me see if I can clarify:

What I understood on IRC is that we're moving the polling from the kick server to the client (i.e. .service file). Whether that works in bash or with a dedicated client is irrelevant. What's relevant is that now the client polls the kick server to check if the DNS was updated.

However, what I now understand from this comment is that you also want to add an in-memory database to the kick server that keeps track of the DNS status in Amazon, and individually polls Route53 to update that database, while delivering requests from the kick client from it's in-memory db.

This adds a considerable amount of complexity, when we could just ask Amazon directly on behalf of the client (which doesn't have the credentials). In other words, the status endpoint could just a proxy that adds AWS credentials and forwards the request to Amazon, and immediately returns the result without caching.

I might be missing something, but I really don't see where the payoffs are in adding that extra layer of caching. Does that make sense?

freeformflow commented 9 years ago

I see what you mean. Let's go with what you just explained for this iteration because it's very straight-forward and achievable. So, for now we can think of the kick server as "just a proxy that adds AWS credentials and forwards the request to Amazon".

The one caveat I see is that we need to standardize the "success", "keep polling", and "error failure" replies from the kick server. This would keep the bash code on the Service's *.service file small and simple. And it would be reusable for other, future requests.

I requested a database on the kick server because I think there is a big future for it now that it is better implemented. But you're right, it's not needed for what you're implementing. I will create a separate ticket presently explaining my thoughts.

PandaWhisperer commented 9 years ago

Well, it's a little bit more than a proxy: it's a fully RESTful API to create, update and delete domain names. The AWS/Route53 backend is now sufficiently encapsulated to allow for easy replacement (or addition of other providers).

FWIW, I am using an in-memory database to store the records and their change IDs. Currently, this is done in an AWS specific format. But this can be changed to enable other service providers.

PandaWhisperer commented 9 years ago

Closing because https://github.com/pandastrike/panda-kick/pull/6 has been merged.