tellapart / aurproxy

Load balancer manager with knowledge of Apache Aurora's service discovery mechanism and integration with Aurora's task lifecycle.
Apache License 2.0
71 stars 18 forks source link

aurproxy

aurproxy is a load balancer manager with knowledge of Apache Aurora's service discovery mechanism and integration with Aurora's task lifecycle. It is a python application that manages a fully featured but not Aurora-aware load balancer (currently only nginx is supported). When Aurora service discovery events occur, aurproxy detects them, rewrites the load balancer's configuration file and then triggers a graceful restart of the load balancer. Use aurproxy to expose dynamic Aurora service endpoints to services and applications that don't have knowledge of Aurora's service discovery mechanism.

Features

Try it

Locally

Replace your zk_servers, role, environment, job, etc. in the value of the "config" argument below:

# 1 - Build and push container to your registry - replace registry address.
PACKAGE_VERSION=20150430.0; docker build -t docker.mydomain.com/library/aurproxy:$PACKAGE_VERSION . && docker push docker.mydomain.com/library/aurproxy:$PACKAGE_VERSION

# 2 - Launch container
docker run -t -i --net=host docker.mydomain.com/aurproxy:20150430.0 bash

# 3 - Run setup to configure load balancer
cd /opt/aurproxy && \
python -m "tellapart.aurproxy.command" run \
  --setup \
  --management-port=31325 \
  --config '{"backend": "nginx", "servers": [{"routes": [{"sources": [{"endpoint": "http", "zk_servers": "0.zk.mycluster.mydomain.com:2181,1.zk.mycluster.mydomain.com:2181,2.zk.mycluster.mydomain.com:2181", "environment": "devel", "job": "web", "role": "myrole", "source_class": "tellapart.aurproxy.source.AuroraProxySource"}], "locations": ["/"]}], "hosts": ["default"], "ports": [8080], "context": {"default_server": "True", "location_blacklist": ["/health", "/quitquitquit", "/abortabortabort"]}}]}'

# 4 - Start load balancer
/usr/sbin/nginx -c /etc/nginx/nginx.conf &

# 5 - Re-run #2 without --setup flag.

# 6 - Test from another shell - assuming host networking
# (otherwise you'll have to map port through in #1 above)
# and something to point to in configured ProxySource.
curl 127.0.0.1:8080/robots.txt

On Aurora devcluster

  1. Set up Aurora devcluster

  2. Build and push container to your registry - replace registry address.

    PACKAGE_VERSION=20150430.0; docker build -t docker.mydomain.com/library/aurproxy:$PACKAGE_VERSION . && docker push docker.mydomain.com/library/aurproxy:$PACKAGE_VERSION
  3. Edit example/aurproxy_hello_world.aur to match the information for the docker image above.

  4. Copy example/aurproxy_hello_world.aur, example/hello_world.aur, example/hello_world.py to your local aurora source directory (will show up in "/vagrant/" in devcluster).

  5. "vagrant ssh" into aurora devcluster instance.

  6. Install hello_world.py prerequisites:

    apt-get install python-pip
    pip install Flask==0.10.1
  7. Create jobs:

    aurora job create devcluster/www-data/devel/hello_world /vagrant/hello_world.aur
    aurora job create devcluster/www-data/devel/aurproxy /vagrant/aurproxy_hello_world.aur
  8. Wait for the jobs to come up - the first download of the aurproxy docker image may take a while:

    http://192.168.33.7:8081/scheduler/www-data/devel/aurproxy
  9. Test from host:

    curl 192.168.33.7:8080
    # Expect 200 "Hello World!"
  10. Find running aurproxy task instance in Aurora web interface, open stderr of "aurproxy" process.

  11. Restart hello_world job while watching aurproxy stderr log to see proxy configuration update in action:

    aurora job restart devcluster/www-data/devel/hello_world

Deployment Suggestions:

Traffic Mirroring & Replay Modes:

Aurproxy HTTP traffic mirroring and replay features use gor, which can be set up to mirror a fixed number of queries per second from a full aurproxy task instance over a TCP stream to one or more gor replay servers. Aurproxy finds gor instances using sources, and will update its gor command line to add new endpoints as they appear and remove old ones as they disappear.

To use traffic mirroring:

  1. Set up a gor replay server. It can be but doesn't have to be in Aurora. See "traffic replay" for instructions on how to set one up using aurproxy to manage it.

  2. Add a gor_process to your Aurproxy task definition.

    # max_failures=0 is important:
    # aurproxy kills the gor process when it updates mirror.sh,
    # and max_failures=0 means that Aurora won't treat the task as
    # unhealthy after some number of gor process restarts.
    gor_process = Process(
      name='gor',
      cmdline='/etc/aurproxy/gor/dynamic.sh',
      max_failures=0)
  3. Pass values for mirror_source (an aurproxy source configured to point to your gor replication server(s)), mirror_ports, mirror_max_qps, and mirror_max_update_frequency into aurproxy when starting it.

To use traffic replay:

  1. Set up a replay job to run aurproxy "run_replay" with the "--setup" flag and then both run_replay and gor, as per above in the traffic mirroring instructions. run_replay command line example:

    cd /opt/aurproxy && \
    python -m tellapart.aurproxy.command run_replay \
      --management-port 12345 \
      --replay-port 12346 \
      --replay-source '{"name": "replay", "source_class": "tellapart.aurproxy.source.ApiSource"}' \
      --replay-max-qps 1000

Lifecycle

  1. Setup
    1. Configure proxy (nginx.conf)
  2. Start proxy (nginx)
  3. Run aurproxy
    1. Initialize metrics collection plugin, if configured.
    2. Initialize and execute registration plugin, if configured.
    3. Initialize and start proxy updater.
      1. Periodically check for update requests signaled by proxy sources and share adjusters.
      2. If need to update is signaled, update proxy configuration and restart proxy.
    4. Start aurproxy's web server
      1. Listens for Aurora lifecycle events - health, shutdown, etc.
  4. Aurora signals aurproxy task instance should shut down (/quitquitquit)
    1. Run shutdown handlers.
      1. Deregister, if registration plugin configured.
      2. Flush metrics.

Develop

Build

Build the image. EG:

docker build -t docker.mydomain.com/aurproxy:latest .

Test

Set up for testing

pip install virtualenv
virtualenv -p python2.7 --no-site-packages --distribute ~/.virtualenvs/aurproxy
cd ~/src/aurproxy
source ~/.virtualenvs/aurproxy/bin/activate
pip install -r ~/src/aurproxy/requirements.txt
pip install -r ~/src/aurproxy/requirements_dev.txt

Run tests

nosetests --all-modules --where ./tellapart --with-coverage --cover-package=tellapart.aurproxy --cover-inclusive

Terminology

Backends

aurproxy is designed to support using different load balancers as backends. Within aurproxy, a backend extends aurproxy.backends.ProxyBackend and is where the loadbalancer-specific logic and management code should live. nginx is currently the only backend implementation, but an HAProxy backend implementation should be possible.

Endpoints

Endpoints are host+port pairs that represent a running task instance. They don't have to be hosted in Aurora.

Sources

Sources are classes that are responsible for providing, maintaining, and signaling updates to a list of endpoints.

Source Lifecycle

  1. Setup
  2. Start
    1. Set up watches on service discovery data sources, if necessary.
  3. Ongoing:
    1. On update (node being added or removed), signal that an update is required.
    2. Return list of SourceEndpoints whenever requested.

Source Implementations

  1. AuroraSource, which watches Aurora's service discovery database in Zookeeper and whenever a ServiceInstance is added or removed, signals Aurproxy's ProxyUpdater that an update is required.
  2. StaticProxySource, which returns a single, statically configured endpoint.

Registration

Active Registration

Active Registration refers to registration events that take place during the normal Aurora lifecycle for Aurproxy task instances - on startup, registration is triggered (if configured), and on graceful shutdown, deregistration is triggered (if configured).

Active Registration Implementations

Synchronization

Synchronization refers to a way to run a registration plugin as an administrative script or as an Aurora cron job. It is intended to both remove outdated aurproxy tasks left behind by slaves that may have disappeared or on which task shutdown was not graceful as well as add missing any aurproxy tasks o supported systems (EG: other load balancers or DNS).

Synchronization Implementations

Metrics

Aurproxy supports recording:

Metrics Implementations

Log Aggregation

Currently, the only log aggregation implementation is Sentry. If you pass a sentry_dsn in via the command line, logged exceptions will be recorded to Sentry.

--sentry-dsn='gevent+https://{{sentry_user}}:{{sentry_pass}}@app.getsentry.com/{{sentry_project}}'

Configuration Elements

Aurproxy Configuration (JSON String)

ProxyServer

ProxyRoute

ProxyStream

ProxySource (Base)

AuroraSource

StaticProxySource

ApiSource

ShareAdjuster (Base)

RampingShareAdjuster

HttpHealthCheckShareAdjuster

Context

Some of the configuration elements above support a "context" dictionary. This a "bucket of stuff" that gets passed directly down to the configuration rendering context and is the place to put backend-specific configuration values (EG: "default_server": True)

NginxServerContext

Configuration Example

import json

cmd = 'cd /opt/aurproxy && python -m "tellapart.aurproxy.command" run ' \
      '--setup --management-port=31325 --max-update-frequency=10 ' \
      '--update-period=2 --sentry-dsn='' --config \''
config = {
    "servers": [
        {
            "routes": [
                {
                    "sources": [
                        {
                            "endpoint": "http",
                            "announcer_serverset_path": "sd/mycluster",
                            "job": "myjob",
                            "environment": "prod",
                            "zk_servers": "0.zk.mycluster.mydomain.com:2181,"
                                          "1.zk.mycluster.mydomain.com:2181,"
                                          "2.zk.mycluster.mydomain.com:2181",
                            "role": "myrole",
                            "source_class": "tellapart.aurproxy.source.AuroraProxySource",
                            "share_adjusters": [
                                {
                                    "share_adjuster_class": "tellapart.aurproxy.share.adjusters.RampingShareAdjuster",
                                    "ramp_seconds": 60,
                                    "ramp_delay": 10,
                                    "update_frequency": 10,
                                    "curve": "linear"
                                },
                                {
                                    "share_adjuster_class": "tellapart.aurproxy.share.adjusters.HttpHealthCheckShareAdjuster",
                                    "route": "/health",
                                    "interval": 3,
                                    "timeout": 2,
                                    "unhealthy_threshold": 2,
                                    "healthy_threshold": 2,
                                }
                            ]
                        }
                    ],
                    "locations": [
                        "/"
                    ]
                }
            ],
            "healthcheck_route": "/healthcheck",
            "hosts": [
                "default"
            ],
            "ports": [
                10080
            ],
            "context": {
                "default_server": "True",
                "location_blacklist": ["/health", "/quitquitquit", "/abortabortabort"]
            }
        }
    ],
    "backend": "nginx"
}
cmd += json.dumps(config) + '\''
print cmd