Closed shlevy closed 6 years ago
Once we're no longer dependent on nixos to set up user ids, we should be able to avoid the import-from-derivation to calculate uids (as there's no real need to have them available at evaluation time).
An idea about the interface (to be still implemented on top of nixos for now):
Each repo describes a set of functionalities
(thanks @rickynils for the term). A functionality
consists of at least a pure services
. Additionally, implementations can add extra fields to functionality
, so for example a multi-machine deployment tool might add a way to specify stuff about the machine
each service
runs on, or an implementation that puts everything on a private ipsec network might add a way to specify the upstream-hosts
each service
needs to connect to. Finally, all functionalities
that should be deployed together should be tied together in one central place.
More concretely for our use case, I propose each repo export a set of functionalities
and we have a single repo tying everything together for all of our deployments. Example of what this might look like:
zalora-git.git
:
{
git = {
service = gitolite-service;
machine.location = locations.singapore;
fqdn = "git.zalora.com";
};
}
zalora-hydra.git
:
{
hydra-web = {
service = hydra-web-service;
machine.location = locations.singapore;
upstream-hosts = [ "hydra-db.zalora.com" ];
fqdn = "hydra.zalora.com";
load-balancer.backends = 2;
load-balancer.source-port = 443;
load-balancer.target-port = 3000;
};
hydra-evaluator = {
service = hydra-evaluator-service;
machine.location = locations.singapore;
upstream-hosts = [ "hydra-db.zalora.com" "git.zalora.com" ];
};
hydra-queue-runner = {
service = hydra-queue-runner-service;
machine = same-as "hydra-evaluator";
};
hydra-db = {
service = postgresql-service;
machine.location = locations.singapore;
fqdn = "hydra-db.zalora.com";
};
}
zalora-deployment.git
:
deploy {
inherit functionalities;
ca = ./ca.crt;
default-backend = backends.ec2 {
default-instance-type = "m3.medium";
services-per-instance = 5;
};
}
Thoughts on this? Obviously at some point we'll want a way to specify that a given functionality is part of an autoscaling group or whatever. @proger @rickynils @soenkehahn
I don't get the upstream-hosts
thing here, could you elaborate?
hydra-evaluator = {
service = hydra-evaluator-service;
machine.location = locations.singapore;
upstream-hosts = [ "hydra-db.zalora.com" "git.zalora.com" ];
};
The services-per-instance
thing is too nominal and generic — we're currently trying to add automation scheduling things using random assignment (if we want automatic scheduling we should start using mesos). We really either want non-loaded services that should be scattered across a pool of three epsilons
(with a deployment model where you don't deploy the whole box but rather a single service) or we want autoscale groups which usually have AMIs that run a single service.
@proger It should probably be better named, but upstream-hosts
is the set of hosts that this machine needs to be able to talk to. Depending on how each service is distributed, that will mean either a) they end up on the same machine, b) they end up on the same VPC, or c) they have an ipsec transport set up.
Having said this, i don't think it's a good idea to couple services and their hosts declaratively at all — and push this functionality to a runtime service (think "mini-PaaS").
Do I understand it right that everything but deploy
is backend-agnostic
@ip1981 That's the goal, but of course we may end up having to depend on backend-specific stuff
@proger Sorry, I don't quite get your latest comment, can you explain a bit more?
@proger Also your services-per-instance
comment :)
I'm trying to express the idea that we shouldn't try to couple software to its infrastructure — this way we're making the interface look like nixops inside out (service-centric rather than infrastructure-centric).
I suggest to attempt to completely separate the specs of services (or things the company pays developers to do) and infrastructure (or things the company rents from AWS/MS/Google) unlike having them tightly coupled (defnix/nixops style).
There are currently three kinds of deployments that we have to cover:
For 1 and 3 what we already have works pretty much ok (imagining upcast already has autoscaling support) ; but for 2 the deployments usually should be performed by developers themselves because they are quite lean, developed/owned by a single person and even may be non-business-critical. If we consider the application of your model we are adding a third-party component that has to operate on the state of the whole infrastructure at once when performing deployments which makes it necessary for somebody to be there when the whole deployment fails — which pretty much defeats the purpose of relinquishing control over the whole deployment at once (even if we delegate that to some automated process like a bot reading git commits and deploying).
Hmm, still not sure I completely understand. Having everything tied together in one place doesn't require literally manipulating the whole infrastructure, the deployment tool can (and nixops currently sort-of does) only actually deploy the services that have changed.
Can you maybe give an example of what you thing the specifications should look like?
literally manipulating the whole infrastructure, the deployment tool can (and nixops currently sort-of does) only actually deploy the services that have changed
it currently has to know the state of the whole system (a statefile) and can only deploy if everything builds (if an unrelated service fails to evaluate, the whole deployment is screwed)
Right, but those are implementation bugs, not interface ones.
BTW while I'm definitely in favor of having devs control the workflow for epsilon
-type services, I don't think it can be completely decoupled, for at least two reasons:
The idea for the interface i ideally would like to see is:
% cat infra.nix
epsilon1 = ec2 { location = sg1; size = XXL; ami = defnix-platform; };
epsilon2 = ec2 { location = sg2; size = XXXL; ami = defnix-platform; };
scheduler = ec2-autoscale { location = sg; ami = scheduler-v1.0b154; elb = "scheduler.zalora.com" }
% cat app.nix
{ defnix }:
defnix.run-periodically {
service = (import ./harvest-spice { database = defnix.runtime-lookup "com.harkonnen.db.resources.spice42"; });
when = every-minute;
};
% defnix deploy app.nix
stderr: ... using your gpg credentials from agent /tmp/gpg123.sock
stderr: achievement unlocked: speedy gonzalez (third app.nix deployment in a minute)
stderr: .... your app doesn't seem to require http
stderr: .... reserving container space at scheduler.zalora.com
stderr: deployed: ssh to your app's environment using:
stderr: % ssh root@app953.epsilon4.zalora.com
Assuming the whole state-tracking and scheduling of resources (picking what epsilon to use) is happening at the scheduler
Hmm, how is your app.nix
fundamentally different from my zalora-git.git
?
Where does infra.nix
live? How does the defnix
tool interact with it?
Hmm, how is your app.nix fundamentally different from my zalora-git.git?
It's not different at all ( i hope), except it does not mention machine.location
anywhere, only specifies the service that you can evaluate into a nixos generic-services module, a regular package, or a systemd unit, or a yaml file for Heroku, or even a dockerfile
Infra nix just specifies the infrastructure that we deploy using nixops or upcast that contains the defnix-platform; All mapping between an app and its potention infra happens without involving nix expressions using some scheduler app (a clone of something like heroku)
OK, so do I understand correctly that your objection is to zalora-deployment.git
and not to the rest? Can you sketch out a bit more detailed how the deployment process would work given those individual functionality repos?
Ah, you updated your comment, so let me respond to that: Generally devs ask for a specific location, and often specific specs, so I thought I'd include that in the functionality-side of things, but there's no reason that setting can't be optional. Do you agree that same-as
(like I used for hydra-queue-runner
) has to exist?
Actually, same-as
should be replaced by a service that takes a set of services and runs them all as one.
Actually, same-as should be replaced by a service that takes a set of services and runs them all as one
Agreeing on this one, there is no reason to overcomplicate the system at this point
OK, on reflection I think I agree with your point more. For reproducibility's sake, I'd like a way to take a state file and redeploy an identical VM/instance/docker container/etc, which means if we're abstracting stuff like epsilon
through a heroku
-like interface we need to have a way to reference the state of the full machine setup at the time of deployment. Thoughts?
Also, can you clarify the services-per-instance
comment? What should load-balancing/autoscaling look like?
My idea being, let's just keep the LB configuration be a thing of nixops-like configuration (i.e. decoupled from defnix services and untounched from the original implementataion) and focus on the definition/deployment of standalone services. Same with autoscaling — AWS (or just requires
interface we need to have a way to reference the state of the full machine setup at the time of deployment. Thoughts?
Why? It's not really what clouds (imo) are about, the app should simply discover it
(i.e. in runtime)
I think in the normal course of things, it shouldn't be used, and it will never be set manually by the user, but at some point it may become important to e.g. "clone epsilon as it was on 10/31/2014 to investigate this bug" and I'd like that to be possible.
@proger Hm, so I define the hydra-web
functionality in the hydra
repo. Where and when do I specify "hey, I want two of these behind an LB"?
I suggest, in the configuration of the LB :) Naturally, it is like you do with anything else — you can either have a stub defnix service that does haproxy, go to console.aws.amazon.com create it yourself, provision it separately with upcast, etc.
Or deploy it 10 times and assign those hydras a single DNS name (in some separate configuration, so it contains all 10 addresses altogether)
So infra.nix
contains an lb
specifier, and at deploy time I say "use this LB"?
So infra.nix contains an lb specifier, and at deploy time I say "use this LB"?
This configuration type is still inside-out, you "physically" can't "use" the LB since. It's rather LB (in smth like infra.nix
or rather in a separate configuration my-own-app-lbs.nix
is configured/created to balance across the things with known dns names, or, in case the dns names aren't known in advance there is a known function that can perform a lookup in some state file or a state http service)
So to deploy a new instance into the LB, I first deploy the instance, then modify infra.nix
to include it? Unless we have some way for instances to register themselves, in which case where should I specify which LB the instance should register itself to?
So to deploy a new instance into the LB, I first deploy the instance, then modify infra.nix to include it?
Yes, this looks like the simplest plan. You likely wouldn't want to do anything smarter at first since including the instance in the LB is the act having things start affecting production and you want to have it manually managed at first (looking at our primary use case now)
OK, that sounds good, I'll go with this for now.
Why? It's not really what clouds (imo) are about, the app should simply discover it
Who said zeroconf? :-)
Zeroconf that works :)
@shlevy i'll work on elb updates in upcast then
There is now a nixops
-based impl (using NixOS
under the hood, of course)
Currently defnix is implemented by mapping to nixos configs, we should add an implementation of a defnixos-only machine/network.