[EPIC] Terra Architecture

This issue is to document my thoughts on the future architecture of Terra.

"Terra CLI"

The "terra-app" repo will be renamed "terra-cli". This will remain the main tool for the project.

"Terra API" or "Terra UI"

This app is a Symfony REST Edition app that provides a basic web UI and REST API for the terra objects. It will be very similar to https://github.com/jonpugh/aegir4

We could add the "job-queue-bundle" to this app (See http://jmsyst.com/bundles/JMSJobQueueBundle). This bundle will handle storing a queue of "terra-cli" commands. It can handle queues, job dependencies, and logging. It has it's own built in web interface for these things that we should leverage.

UPDATE: JobQueue Bundle is out for now. @jlyon created a working setup with RabbitMQ, and I was able to get it working with a RabbitMQ container.

"Terra UI"

@jlyon also created a Drupal 7 module with apps and environments in a few days. They have been using Aegir for a while, so it only took him a few days to build it.

You can launch the Terra UI in terra itself with

terra app:add ui https://github.com/terra-ops/terra-ui-prototype/
terra env:add ui local --enable=1

The Drupal 7 site interacts directly with the RabbitMQ server via rest. This means that it can be hosted anywhere as long as it can access the RabbitMQ server.

The backend "receiver.php" will become the command terra queue which will run continuously. This is equivalent to using the drush @hostmaster hosting-queued command.

Terra UI Example Apps

Once we have a REST API in place, we can make an example UI app that could be a Drupal site. I imagine we could make a public drupal module providing drupal entities that map to the terra objects and an interface to the Terra API app.

This would open up the possibilities for multiple front-ends built in Drupal, hopefully doing all of the heavy lifting for things like future Aegir 4, devshop 2, WFTools, etc.

Feedback!!!

PLEASE provide your thoughts! We need user feedback so we design a system that works for everyone.

THANKS!

I see a lot of overlap with Aegir 4 and this proposal. Aegir 4 development is in progress (though we're also working on cleaning up some of the more "soft" stuff at the same time, so progress is admittedly slow).

As general background for anyone that's not familiar or wasn't present: Aegir 4 will initially be some components that sit on top of Kubernetes, and will primarily focus on the production deployment story. For the time being, we're completely ignoring the local development story. We're going to be building a REST API that sits on top of the Kubernetes API and will help with access control and the like. Eventually, it'll be the primary mechanism that will allow us to swap out the backing PaaS (for instance, allowing us to use Flynn instead of Kubernetes). We're also eventually going to build a D8-based UI that sits on top of the REST API and is more or less stateless.

I mentioned this in Gitter, but saying here for posterity: I'd really love for Terra to focus on the local development story, Aegir 4 to focus on production deployment, and for Terra to integrate with the Aegir API to facilitate moving things from local to dev/stage/prod/other non-local environments, and vice-versa.

Thanks, @cweagans, I'm excited to work this out.

Let me start first by personally apologizing to you and @ergonlogic and any other aegir maintainers for not directly inviting you to contribute on this project first. I've wanted all of your help since the start but never directly asked. The offer stands to give you full commit access if you want to use this project.

What we are proposing is to make terra the new "aegir backend". I don't literally want to do all the things. I hope terra can be a simple common toolkit that we can all use to do all sorts of things, from the most simple to massively epic systems. It's just a command line interface and a couple other tools. I hope other, more smarter people than can come along and contribute to the harder parts of this project, and the aegir project, at the same time.

The best part is, it already works. You should try it! It's actually fun to use. We are going to build a KubernetesDriver for terra, which will make Kubernetes just as fun to use. It's almost the same process as using docker-compose: write yml, run command. Should be easy, once we get around to it. ;)

If the aegir team adopts terra as the backend, then they are freed up to only deal with the harder and more interesting things about production hosting: massive scaling, monitoring, logging, resilience, integrations. This isn't stuff we want to do just yet, but we do want to guide the building of those kinds of tools in a way that is modular, decoupled and useful for everyone.

So, we are definitely focused on the local development story!

From the users standpoint, the "local developent story" consists of:

I want to get code and put it on my computer.
I want to turn the website on.
I want to edit the source code of the website and see the results in realtime.
I want to test my website in various ways, whenever i am working on it.
I want to turn the website off.
I want to remove the website and the code.

This has already been accomplished with the terra command line interface. It's frankly awesome to use. I have been using to develop drupal sites for weeks now, and it's actually fun to use.

On top of that, thanks to the app's .yml file, it is ridiculously easy to add more services to your app or drupal site. For example, with the terra UI prototype built by @jlyon, I was able to add a rabbitMQ server to the mix using a few lines of yaml.

Now, let's go through the "production hosting story":

I want to get code and put it on my server.
I want to turn the website on using extra care to make sure it's scalable and resilient.
I want to test my website in various ways, all the time.
I want my website to stay online and not lose data.
I want to update my website often.
I might want to turn my website off. (and then back on again).
I might want to remove my website and the code.

And then the finally, "quality assurance hosting story":

I want to get code and put it on a server.
I want to turn the website on.
I want to test my website in various ways, all the time.
I want to turn the website off.
I want to remove the website and the code.

This is an obviously simplified list, but the point is, for all three of "dev", "testing", and "production" stories, we need very similar things.

These tasks exist regardless of how it is done, so we built terra around the concept of "apps" and "environments", so we could very easily:

terra app:add
terra environment:enable -terra environment:test
terra environment:deploy -terra environment:disable -terra environment:remove

The next step is to make sure these tools help the production hosting story. We could really use the aegir teams help with work on:

Kubernetes Environment Driver for terra. (Write yml, run command).
Aegir Front-end: Production Hosting Management done right:
- This module gives you an "Environment" and "App" node types. https://github.com/albatrossdigital/terra_ui All aegir has to do is to add this module that saves commands via REST. This module works with terra, right now. See https://github.com/terra-ops/terra-ui-prototype for instructions for turning it on.
- With a simple REST API, you can build a killer new Aegir UI without worry at all about the task queue or managing server config. You just ask for a new site and you get one.
Scaling, Logging, monitoring, alerting systems. It's really easy to add containers to a terra app, See: https://github.com/terra-ops/terra-ui-prototype/blob/master/.terra.yml#L4 It's a great way to quickly experiment with docker setups.

Thanks a ton for reaching out. Let's make this happen!

P.S. Aegir is a God of the Sea. Terra is the Goddess of the Earth. Makes a nice pair, no? :smile:

First off - thank you for the invite for commit access. I'll decline right now, but hopefully the option is still there later. I want to focus on the production hosting side of things for now.

Secondly, it's been pointed out to me that my directness (particularly in written communication) can come off as me being an asshole. If that's the case with any of my past, present, or future communication, I'm sorry. That's really not my intent. For the record (and thanks, @ergonlogic for pointing me toward the concept), you can assume that I'm operating by Crocker's rules when communicating with me.

Regarding the technical points, I see what you're saying, but there are some problems with this approach (using the same tool for every job). I've tried to articulate them before, but reading back, there was some amount of negativity that came across, so I'll try once again from a more neutral standpoint.

Dev/prod differences

The high level concepts for development/production hosting are basically the same, but there are some very important things that are different. For production hosting, my goal is to be able to support 1000 containers running the same stateful application (whether that's Drupal or a game server or something entirely different) along with the application dependencies (mysql, rabbitmq, memcache, whatever). That means the load balancer, database, and file storage components need to be pretty robust and I'd much rather trust something from Google in that regard. They have a ton of man hours spent on figuring this stuff out, and they've done it really well.

Too many assumptions

Terra (as it is right now) makes too many assumptions about the application that's going to be running, namely Drupal. I realize that you can run other PHP applications pretty easily if it supports Drupal, but we have other needs too. For instance, we need to support Jekyll, even if only as a proof of concept. Other non-web applications are also a valuable thing to support.

Terra also currently assumes that you'll use the Terra containers. That's not always going to be the case, especially as people are using more non-Drupal technologies for development. Jekyll, as I mentioned, immediately comes to mind, but less web-centric applications should be possible too (think game servers or IRC bots or the like).

Terra also makes the assumption that you'll be running commands locally (and the canonical storage location for application metadata is on whatever machine Terra is being run on). This is pretty easy to partially work around (just run it on some centralized machine and kick off Terra jobs through the task queue/Terra UI), but one of the things we don't like about Aegir is the directory full of Drush aliases on the Hostmaster server. This seems to just be changing the language of those files.

We want the knowledge of what apps are running on a production cluster to be distributed so that that knowledge is highly available by default. If it's files on a disk somewhere, that's a single point of failure. This is part of the reason that we're going with Kubernetes: the knowledge of what is running on Kubernetes is distributed, and if any one of the nodes in that cluster go down, the containers are reassigned, a new master is elected, and things keep running as you'd expect. This is also very important for people running on bare metal. It's inevitable that hardware will eventually need to be retired for whatever reason, and when that time comes, being able to tell Kubernetes about it and have it handle the logistics for you such that when it's done, you can simply unplug the server is a huge plus.

Another benefit here is system upgrades. You can do a rolling update - sequentially rebooting to install updates across any number of nodes in a cluster, and as long as Kubernetes is doing it's job, there won't be any downtime. Providing the capability to do a zero-downtime rolling upgrade on a mission-critical piece of your infrastructure is an amazing value-add for ops teams, and isn't done very often because it's sort of like trying to change a tire on your car while you're driving down the highway.

PHP queue

Terra uses jms/job-queue-bundle. It's good at what it does, but another reason for the Aegir 4 rewrite is to get away from the PHP daemon. It's an endless source of problems when you start doing too many things with it. For extremely large deployments, we also need to be able to concurrently execute tasks. I don't know if you've got the queue bundle set up do that or not, but that's a hard requirement for Aegir 4 in my mind (one of my clients has a hook_cron implementation that queues a bunch of backup tasks at 1am. They aren't even halfway done by 8am, and if they need to do other things with Aegir, that's kind of a problem). The reason I bring that up is because I've experimented with parallel execution in Aegir 2 and 3 and that only exacerbates the problems we run into, particularly around running out of file descriptors and memory usage.

Monolithic tooling

Having different tools for different jobs is not a bad thing, as long as the set of tools can be brought together into a system of systems. I think that Terra handles the local development story really, really well. I've played with it and I like it. I don't know if I'm just having some conceptual hang up somewhere or what, but it seems like at the point where you're finished developing something locally, you should hand off the site to $something_else (Kubernetes, OpenShift, Aegir, etc) to handle environments that will be shared among multiple developers/testers and to handle scaling.

I think this would be really easy to do and would simplify Terra a lot. Basically, you could just point Terra at the Aegir API and give it your credentials (or token or whatever we end up using), and run some Terra command to pull down a copy of the site, work on it, and then push it back up to Aegir (and pushing it back to Aegir could mean a development env or something - we'll have that separation too).

See also: Unix philosophy. Different tools for different scenarios seems really appropriate here. We also have problems with the current Aegir trying to do everything. It's almost impossible to swap out any one major component without causing a ton of other problems at best, or at worst, the entire setup going up in flames.

This plays into the Aegir/Terra duo that you mentioned too - "ship it" is literally the goddess of the earth handing off code to the god of the sea!

Workflow assumptions

As far as I know, Terra currently expects read access on Github repositories to be able to spin up a set of containers. For production hosting in particular, that's not always a valid assumption. Sometimes organizations will want to add a Git remote and push code to Aegir (a la Heroku). Other times, they'll just have specific containers they want to deploy and scale. Other times, they might need SFTP access to their codebase (@mlhess mentioned that this is something they do currently with their Kubernetes installation). Other times, maybe they'll be using Perforce (ugh.) or CVS or whatever. My point is, there are a lot of different workflows we're going to have to support. It may be that there are plans for Terra to eventually support those other workflows and just hasn't gotten around to it yet, so maybe this point is moot.

Custom build logic

This might be less of a technical point and more of a preference, but I don't like that Terra has it's own logic for building a container (https://github.com/terra-ops/example-drupal/blob/master/.terra.yml#L2). For web applications, it makes a lot of sense to me to just use Buildpacks. They're generic enough that they can handle pretty much anything out of the box, and if you need to do anything more on top of that, you can use https://github.com/ddollar/heroku-buildpack-multi and specify whatever combination of buildpacks you want. For instance, you can use the PHP buildpack to do most of the config, and then a Terra-Drupal buildpack to further customize that environment if necessary.

Config not in the environment by default

I don't know whether or not you've considered this, but hardcoding the database information in settings.php won't scale well. For a platform like this, it seems like you should spin up a database container and link it to the web container, which will set some environment variables that contain the database connection info. Configuration in the environment is the more future-proof way of handling that kind of thing, and it's likely the direction that we want to require in Aegir 4, particularly because the connection information can change on each deploy (whether or not you want it to - that's a side effect of running your site in containers). Docker compose handles this out of the box, by the way. For example, if you have a container titled "php" which runs php-fpm and exposes port 9000, you can use fcgi://php:9000/$1 as the fcgi proxy URL. Not sure if you'll be able to use that or not, but maybe you'll find that info useful.

Production needs scalability, logging, and monitoring OOTB

With production deployment in particular, deploying to Aegir should get you scalability, logging, and monitoring out of the box. It's not something that the developer/admin/ops person/whoever should have to configure. It should "just work" with whatever solutions are in place to solve those problems.

Separation of discussions

It seems like @ergonlogic and I (and others too) are constantly talking about requirements and architecture plans and such in #aegir and you're not there. We need to be better about documenting those things, but I think it would help a lot if your architectural planning were happening at the same time/in the same place as us.

Plans for Aegir

Right now, our plans have Aegir distilled down into just a D8 install profile, a handful of custom modules, and a theme. We don't need to build the REST API, a queue, or really anything else in order to make it work because Kubernetes and Openshift handle all of that. That's a really nice place for us as a project to be in, because we don't have to handle hardly any of the really complicated problems that go along with those components. It also really simplifies upgrades. If the only thing that we're storing is user accounts (and I'm not even sure we'll need to store those. There might be some authorization mechanism in Kubernetes that we can build on top of), then upgrading to D9 will basically be a matter of porting our profile, modules, and theme to D9 and making sure the core user upgrade path is working.

Sure, scaling a database is still a problem (making it so that you can do the equivalent of docker-compose scale db=20, and ensuring that the resulting containers are spread out across physical nodes as much as possible), but at that point, it's just building a container with the related discovery/linking code built-in and making it available on Docker Hub. Kubernetes already handles the load balancer, and it'll handle the file storage orchestration part really well too. @mlhess is in the process of getting Ceph up and running (not sure if he's going the block storage or object storage mode, but I'm sure it'll be fine either way), as he found Gluster to be a bit slow, but in any case, it's mostly building a container and setting up some Kubernetes configuration.

Once those things are solved in a way that supports HA deployments, we likely will never have to deal with it again (and that work will be available to the wider Docker community). This frees up a ton of future time to work on usability and providing nice workflows for everything from a Pantheon-like hosting setup for multiple clients to easily self-hosting 10,000 or more applications for internal use at a given company. Another good thing about this is that some of the code that we'll end up writing in this scenario would be good candidates to include upstream in Kubernetes or Openshift, so the amount of code that we'll end up maintaining will hopefully decrease over time.

I know that we initially discussed PaaS agnosticism, but as I get more information and wrap my head around Kubernetes more, the less sure I am that that's a reasonable thing to actively pursue as part of the Aegir project. Kubernetes is, in my opinion, the best tool for the job, and I'd personally be okay just saying "eff it. We'll just require Kubernetes", or at a minimum, only officially supporting Kubernetes and leaving the rest to contrib.

I don't know how to word this nicely, so I'll just say it and hope you remember my intent is positive - I think you've been overly dismissive of the complexities of production hosting. I know you know what you're doing, but @ergonlogic and I both (and presumably many others) have specific needs around HA and scalability. For one of my clients, it's an inexcusable thing for their site to be down for any reason, even during a deployment, so we have all kinds of crazy things in place to ensure that it never happens (barring some kind of world-ending event). If you're feeling a lot of resistance to building the production side of things in Terra, this is one of the reasons why. Jon Rudenberg (the Flynn guy) was talking to us over lunch at NYCcamp, and even for him - somebody that does this stuff for what I can only assume is 9-10+ hours per day every day and is really out in the weeds (building all the components of Flynn more or less from scratch) - file storage and HA database management is something that causes him to lose sleep at night. It's a really hard problem to solve and every time I've brought it up with you, I get something to the effect of "Don't worry. We'll figure it out. It's just file storage." I have personally been down this road before (when I wasn't using Aegir) and it's painful and full of frustration, and I cannot stress enough how incredibly important it is that we get it right.

I really want Terra to be successful, but other than "stand up a website", I don't see a lot of overlap between the production and local development scenarios, and that might be where our disagreement is. The way I see it, you've been focusing a lot on the local development bits and it's really good. We've been focusing a lot on the HA/scalable production bits and what we've planned will be really good.

Later down the road, I'd enjoy helping with Devshop 2 if it's going to be built on top of Aegir 4. The Aegir D8 UI should be more or less stateless (since all of the information about what's deployed on Kubernetes is stored in Kubernetes), so Devshop can exist at the same layer as the Aegir UI, talk to the same API, and do the same things. The UI will just be different - more suited toward the Pantheon-like workflow.

In summary

Basically, what I'd like to see happen (and @ergonlogic, feel free to chime in here if you have anything to add):

Refocus Terra on providing an awesome local development environment and nothing else. Note that local development is still a pretty big area to cover. We have a tool at NBC called Flo (https://github.com/NBCUOTS/flo) that handles a lot of things related to this, so there might be some opportunity to merge the tools together eventually, or at least take some inspiration from it.
Plan to integrate with the API that Aegir 4 will expose for moving sites to and from Aegir. If you wanted to cast a wider net, it could be cool to also integrate with Pantheon and Acquia so that you can have the same workflow regardless of what hosting system you're using. This would also facilitate migrating between any of those services, and that could be a big selling point for anyone considering that kind of move.
Move terra-cli to the aegir-project organization, so that it can take advantage of the nonprofit corporate umbrella that we're going after right now (becoming a part of the Software Freedom Conservancy, and as a backup plan, either SPI or FSF). Being part of a nonprofit will help a lot with getting all of our projects funded and getting some real traction across all of the communities that we'll be involved with. My hope is that with a unified architectural direction and technical team, we can use some of those funds to send people to various conferences to talk about Aegir and how it can be useful to different industries/development processes/whatever. We can also build up an Aegir partner ecosystem that should help to funnel business to the partners that support Aegir development. We can write books. We can provide direct consulting services (through the Aegir project). Any number of things, really, but my point is, we can go a lot further as a team, rather than separate projects.
Help us build the kind of bulletproof production-grade hosting environment that you'd feel comfortable handing off to your clients

My personal vision is that we can "sell" Aegir as a package of the following components:

Rán (@ergonlogic's D8/Ansible based installer VM) as a way to bootstrap a new environment
The hosting system itself
A local development stack that integrates tightly with the hosting system (Terra?)
Insanely thorough user, administrator, and developer documentation (whether that's a book or online documentation - doesn't much matter), including well defined best practices and ops processes designed to mitigate business risk.
Information about available support channels for all of the above.

Above all, I don't want you to feel like I personally (and we, as a project) don't value the work you've been doing. I mentioned this a few times in this post, but I think Terra is a really good thing for local dev and I like it. I'm probably going to recommend that we use it internally at NBC for local development, though we might need to resolve a couple of the points I mentioned above before we can commit to that (namely, the assumption that we'll use the Terra containers and the hardcoded db connection info in settings.php. We have our own containers that we'll want to use, as we have some pretty specific requirements around what needs to live inside of them). Ideally, that will be a bridge to using Aegir 4 internally at NBC on the Kubernetes cluster I'm told our ops team is considering, but we'll have to see.

I think there have been a number of assumptions made here since the beginning that are simply incorrect.

Incorrect Assumption:

Terra also currently assumes that you'll use the Terra containers.

You can completely replace the docker layout using your app's .terra.yml right now. Choose your own containers. Replace the defaults. Add extras.

We plan on making the default compose stack pluggable so it doesn't force the drupal stack by default, but for right now, you can use .terra.yml file in your app:

docker_compose:
  # Overrides will replace any item in the entire docker-compose array.
  overrides:
    app:
      image: wordpress
      environment:
        WORDPRESS_DB_HOST: database
        WORDPRESS_DB_USER: drupal
        WORDPRESS_DB_NAME: drupal
        WORDPRESS_DB_PASSWORD: drupal

Putting this code in your .terra.yml file will replace terra/drupal with the official wordpress docker image.

Incorrect Assumption:

hardcoded db connection

This is out of complete laziness, terra is still a proof of concept. We can randomize the passwords easily.

Let's work on this one.

Incorrect Assumption:

Terra (as it is right now) makes too many assumptions about the application that's going to be running, namely Drupal.

Like I said above, yes, the default container layout is for drupal, but it will be pluggable very soon.

However, the app can override it's containers 100% right now. Use .terra.yml.

You can swap out your app container for ruby or node, for all we care. You can add another container for memcache, or jenkins, or rabbitMQ, or literally anything else. I'm not sure how I could be more clear on that.

Incorrect Assumption:

Terra uses jms/job-queue-bundle.

This was just for research. Turns out RabbitMQ is a much better solution. We've gotten the queue system working with Rabbit here: https://github.com/terra-ops/terra-ui-prototype/blob/master/.terra.yml

6 lines of yml added the rabbitMQ container to our front-end drupal 7 site.

Incorrect Assumption:

Terra currently expects read access on Github repositories to be able to spin up a set of containers.

Well? Something has to have access to the code, regardless if it's deploying it to production or development machines. It doesn't have to be running on the same system that the containers live on if you hook up a remote docker daemon. If this is really a problem we can resolve it one way or another. If kubernetes fixes it, then great! We'll get that solution when the kubernetes driver is completed.

Incorrect Assumption:

Terra also makes the assumption that you'll be running commands locally (and the canonical storage location for application metadata is on whatever machine Terra is being run on).

Wherever you go, there you are. In your production hosting environment, you will be running commands "locally". There will be metadata stored somewhere. If you need this to be distributed, then we will figure out how to distribute it. And again, if kubernetes fixes it, then great! Kubernetes Driver.

Incorrect Assumption:

Terra has it's own logic for building a container

Again, you can use your own. Make your own container. Either rebuild with docker build -t terra/drupal or add what image you want to .terra.yml. Very easily done, right now. If you want to use "buildpacks" to create your container, go right ahead. There is nothing about terra that prevents you from doing those things.

I don't know if I'm just having some conceptual hang up somewhere or what, but it seems like at the point where you're finished developing something locally, you should hand off the site to $something_else (Kubernetes, OpenShift, Aegir, etc) to handle environments that will be shared among multiple developers/testers and to handle scaling.

This is precisely a "conceptual hang up". Your production system needs a CLI. Using those orchestrators' CLI is not easy.

I think the debate here is really about what it is called. We know the components that are needed. We need a CLI. We need a command queue. We need a REST API. However, which tool talks to what API is not irrelevant. We want this stack to be simple and easy to develop so others can contribute, modify and extend.

Move terra-cli to the aegir-project organization

I'm not completely against this, but...

If aegir is supposed to be synonymous with production, then it doesn't make sense to move terra-cli to aegir-project. The cli is used for dev and testing as well. If it were called "aegir-cli", and we told people to use it for local dev, then wouldn't aegir then be claiming to do "all the things"?

The interesting thing is, thanks to symfony, you could create an "aegir-cli" project that included the terra commands. See https://github.com/terra-ops/terra-api, which is a full symfony distribution. app/console is the CLI for that symfony distribution. It includes commands from all of the components, from doctrine to terra. you could include your own "production-only" commands in a new package.

Wrapping up...

My biggest question for you is, if the terra integration with kubernetes works, and you get all the benefits of both, why wouldn't you want to use it? You'd get the CLI, you'd get the Queue, you'd get the REST API, and you'd get PaaS agnosticism right now.

On a final note, please don't forget that this entire initiative is pre-release, and in development. Everything you see here can change at any time. Suggest actionable tasks and we will do what we can to make it work for you.

However, I would recommend that instead of having larger discussions about the architecture we will build in the future, let's focus on what would you do to change this project to be useful to you, now.

We're building really interesting things, right now. Join us and start tinkering!

I am inspired by the Agile Manifesto: https://en.wikipedia.org/wiki/Agile_software_development

Individuals and interactions over Processes and tools
Working software over Comprehensive documentation
Customer collaboration over Contract negotiation
Responding to change over Following a plan

I think the debate here is really about what it is called. We know the components that are needed. We need a CLI. We need a command queue. We need a REST API. However, which tool talks to what API is not irrelevant. We want this stack to be simple and easy to develop so others can contribute, modify and extend.

...which is precisely why we need to avoid writing code when we can, and for a lot of these things, we can because Google and Red Hat have done it for us.

If aegir is supposed to be synonymous with production, then it doesn't make sense to move terra-cli to aegir-project. The cli is used for dev and testing as well. If it were called "aegir-cli", and we told people to use it for local dev, then wouldn't aegir then be claiming to do "all the things"?

Aegir (the larger organizational umbrella) wants to provide a curated set of tools that handles everything from local to production. That doesn't mean we have to build every component from scratch, especially when there are other battle-tested technologies out there that handle many of the difficult things for us. Aegir (the project) - yes, that will be mainly focused on production hosting.

My biggest question for you is, if the terra integration with kubernetes works, and you get all the benefits of both, why wouldn't you want to use it? You'd get the CLI, you'd get the Queue, you'd get the REST API, and you'd get PaaS agnosticism right now.

There's no reason to build out the kind of infrastructure that you're planning because it's already been built. Kubernetes and OpenShift completely mitigate the need to roll your own command queue and REST API and orchestration strategies and all that other stuff. It's just handed to you and it works and you don't have to worry about it. Maintenance and testing for those components is "outsourced" to other open source projects backed by major companies that have a strong commercial interest in solving this problem in a way that makes sense for everyone from one man shops to NBC-scale orgs and bigger, and that's exactly what we want to do.

You seem unwilling to compromise on your view that Terra should be the gateway to everything, and while I'm not sure why you'd want to go down that road again (that's exactly what Aegir is right now, to a lesser extent - a monolithic pile of code that takes over your entire development process), it's completely fine, and I hope you're successful. Nothing but good things can come from competition.

However, we (the Aegir project) have already established the architectural goals that we want to pursue and the method by which we're going to accomplish them, including going through the process of gathering input from the eventual end users of the product, several large universities included. While Terra may eventually solve those problems in a way that make sense for those customers, Kubernetes and Openshift solve them today, and really large "anchor" customers are already on board with using it when it's ready to go (some of them are already using Kubernetes).

I've minced words till this point, but I'll just say it now: Terra is not going to be the backend to Aegir because we're aren't going to have our own backend. We're just going to defer to Kubernetes and Openshift for that. If we can do it in a PaaS agnostic way, that's great, but I don't think we're going to go out of our way to support it, as it introduces a lot of complexity.

The Aegir frontend will likely expose a limited set of operations via a REST API. If you want to integrate with that through Terra, that'd be cool. If not, we'll probably just end up building our own CLI that's essentially a wrapper to docker-compose and the Aegir REST API to provide local development and nothing else.

Where we go from here is really up to you. If you're willing to work with us on the plans that we've already made (again, based on potential customer feedback) and more or less finalized, we'd welcome the help. If you want to keep doing your own thing with Terra, that's fine too.

@ergonlogic If you'd like to continue this discussion, feel free, particularly if you want to overrule me on any of the above points. I'm going to bow out because everything that I have to say has been said.

I'd like to see Ægir, as a software project, become more of a collection of tools, rather than its current monolithic architecture, following the Unix Philosophy: "do one thing and do it well." I'm contributing Rán on the basis that it is complementary to how we see Ægir evolving. I'd love nothing more than for Terra to find such a role within the project.

The entire impetus of re-engineering Ægir is to take advantage of existing best-of-breed libre software. As a production-grade container-based hosting system, Kubernetes/Openshift looks pretty feature-complete. I'd very much like to support other backends, especially Flynn, once it is further along. But I believe we're better served contributing to those upstream projects, rather than duplicating efforts.

Great, thanks, @ergonlogic! I too see terra fitting into aegir as a component.

I've never heard of Rán, could you post a link?

I have no plans to duplicate the efforts of docker orchestrators directly in terra. I simply want to support the lowest common denominator with docker baseline. Much like we did with Aegir 3 and prior: it works out of the box, has some rudimentary scaling features, but if you wanted to anything more serious, you would have to add in more robust tools.

I'm really looking forward to this collaboration.

As a first experimental step, I think we can get the aegir3 front end running on terra, and using terra queue without much effort. I created a terra app that tries to install hostmaster, but it fails because the install profile depends on d(). See https://github.com/terra-ops/example-aegir

If we removed that dependency on provision, and added a module that sent commands to the RabbitMQ server (similar to albatross_digi's terra,UI module), we would have Aegir 3 able to launch sites on containers in short order!

Thanks a lot for your feedback.

Keep it coming!

Just to be clear, I agree with @cweagans' assessment. Our current plan for the next generation of Ægir is to build atop Kubernetes and Openstack. Together these are a feature-complete, production-ready, container-based hosting system. To be honest, I don't see how the current direction of Terra's development fits into that.

From what I can see, Terra is trying to do too much. We want small, composable tools built on a solid foundation. Terra appears to be trying to be both the tooling and the foundation in some cases. I think we would find Terra extremely valuable if it provided an automated local docker development environment, because setting that up is itself an annoying problem for every new user to deal with. Something like that would be happily welcomed into the Aegir stack.

terra-ops / terra-cli