waggle-sensor / beehive-server

Waggle cloud software for aggregation, storage and analysis of sensor data from Waggle nodes.
14 stars 17 forks source link

Design and prototype provisioning of core piece of infrastructure #39

Open seanshahkarami opened 7 years ago

seanshahkarami commented 7 years ago

We should discuss a design where we have a few of the core pieces of beehive's infrastructure "preexisting" and then build on top of that. That could reduce "what beehive is" to primarily being glue between a few, well-operated components.

For example, RabbitMQ, Cassandra and Elasticsearch are probably best operated as core pieces of infrastructure, operated, managed, backed up by someone familiar with best practices and procedures. Different Beehive deployments would then live in a virtual namespace within one of these pieces of infrastructure and would be a configuration option.

For example, we keep a few machines around solely to operate the core RabbitMQ and Cassandra cluster, then deploy a production and development beehive onto the same clusters, just under different namespaces.

seanshahkarami commented 7 years ago

This also has the side effect of clearly defining where we're doing ops vs where we're doing development and allows people who are interested to focus on understanding the details of core service.

seanshahkarami commented 7 years ago

Looking forward, clustering the design this way may allow us to scale horizontally transparently since we could just add another machine into the RabbitMQ or Cassandra or Elasticsearch cluster if needed. RabbitMQ makes this kind of distributed design relatively straightforward to start building.

As far as tooling goes, there are also a number of options for approaching provisioning ourselves. For example, docker swarm + stack allows multiple machine deployment. We could also use Ansible and specify machine roles for deployment.

vforgione commented 5 years ago

There are a few things that need to be considered ahead of time:

  1. Where is this being deployed (i.e. how much control do you have over the metal and/or VM)?
  2. Are you ever going to offload standard services to provider (e.g. RDS for the relational db, ElastiCloud for ES)?
  3. Is scaling resources for specific services a concern?
  4. If scaling is a concern, does that scale grow linearly or in bursts?
  5. How many tools are too many?

The biggest area of concern from my point of view is that this repo covers way too much ground. There's provisioning, configuration, networking, app code, ... It's really hard to make heads or tails of this.

A good starting point would be to separate the server code from the procedural services/operations stuff. The server/applications code could also possibly even be broken into self contained repos. There isn't much in the way (as far as I can tell) of one application talking to another so they may as well be broken out for easier maintainance.

As for the services and operations stuff: there's a lot going on there. There are some really standard things (like Cassandra and Elasticsearch) and some custom services (like certificate issuing) that need to be thought of in the context of how they interrelate in the system. Similar to how the app code isn't reliant on one piece or another, it seems like much of the services can stand on their own. So you can look at these as off the shelf services and custom purposed services.

The off the shelf services are dead simple. No matter what deployment tools you choose there are a zillion ways to get them set up.

The more customized services are going to require a more nuanced approach, and that will be tied to how you handle deployments. Like everything else, I have strong feelings toward breaking things up into single purposed pieces.

As for deploying this stuff out, I like Docker as a means to containerize things. It makes local development easier and there are tons of deployment solutions that work with it. Given all the requirements for Beehive, I think Kuberenetes is a good choice: you can add as many machines as you need to handle volume (memory, disk space), you get full control over internal networking, you can auto-scale services/resources, system-wide configuration/env variables, and it's more or less the de facto choice for admins and ops people now (easy to hire people to maintain it).

Ansible works well for bring up machines (or VMs) and configuring things, but it's awful at maintaining state across deployments/clusters. Especially if you go the container route, it's just as trivial to customize the container build as it is to write custom Ansible roles.

The downside to all of this is that it's new tooling that needs to be learned. That's not necessarily a bad thing -- it's just a thing and requires a nominal time investment. Overall it seems to be the right direction: single purpose projects that are bound together in a private network with a central administrative process.

Just my 2 cents.

wgerlach commented 5 years ago

I agree that this repository should be split up into smaller more independent pieces. For example, this was already the plan for all "beehive-" -prefixed directories.