Harmonize internal backends and how to expose services to the public

renoirb commented 9 years ago

We reached a point in the way we expose web applications where it creates a set of issues that needs to be addressed.

Most web applications are configured in a way that it has to run both web server and backend code from the same virtual machine, preventing us to scale our capacity.

Also...

Forces us to require a public IP address per web app and server, making it impossible to scale
Makes a big configuration file, where we could simplify things by splitting per responsibility
Has made the configuration to grow into a "hard to maintain" state, without clear conventions
Has service resiliency and design flaws that prevents us to serve web apps well when we can’t use Fastly

Context Before this project, it was impossible to implement "horizontal scaling" by adding frontend and/or internal upstream nodes.

Requirements

3 (or more) public IP addresses for public front ends web servers
Abstract backend technology server handling (e.g. Docker container, php-fpm, HHVM, Python, Node JS, etc)
Each internal upstream exposes HTTP server on private network
Ensure minimal changes are made on web application code (e.g. handle rewrites in internal upstream service, prevent change of Host: ... header by using IP:PORT, let frontend service handle header cleanup to improve caching, etc)
Public front end proxy requests to an internal upstream service, optionally filtering responses to improve cache hit rate
Limitations
Some services cannot be served by Fastly
- Accounts service; We want our network to be the only ones involved in the network traffic it generates
- Annotation service; Because it uses WebSockets and Fastly doesn’t support it
  Proposed conventions

An One-to-one convention. Each public name (e.g. notes.webplatform.org), we proxy it to internal upstream service on the private network, exposing an HTTP server on a pre-determined port specific for each web application.

With this, we’ll be able to limit the number of public IP addresses while still allow us to scale by adding internal upstream node without being impacted by the number of web apps we are running.

upstream

Run backend runtime service (e.g. Hypothesis’ gunicorn, php5-fpm fastcgi, etc) and listen on unix socket or loopback (e.g. 127.0.0.1)
1. Monit to ensure backend runtime is up and alive
Minimal NGINX virtual host (called "upstream") listening on private network serving ~~on port 80~~ a port defined per web application.
1. Monit to ensure NGINX is up and alive
2. Expose to private network service status information of both NGINX and Runtime (when applicable)
3. Handle aliases, requests rewrites, and optionally other optimizations
4. Serve static assets so that the public frontend can proxy it without needing to install the web app only to serve static assets
Internal DNS to list upstream
1. Salt to update internal DNS to CRUD A records so the DNS always knows which private IPs ~~can handle a given web application backend~~ but use it as a query mechanism to generate config file? Or not use it. TBD
2. ~~Assign internal name per backend (e.g. notes.backend.production.wpdn)~~
  public
Listen to public IP address, proxy requests to internal upstream web server. e.g. notes.webplatform.org to proxy to ~~notes.backend.production.wpdn~~ internal upstream nodes that serves on port 8005 that’s assigned to our Hypothesis instance.
Redirect non SSL requests to its equivalent through SSL
Abstracts upstream server load balancing by calling internal DNS name which gives out the list of available servers, see NGINX upstream feature
Handle response filtering prior to serve
In case a internal upstream server is broken, serve a human-friendly error page
Nice to have
Have web apps that are limited to run under apache and mpm-prefork (i.e. MediaWiki) to be proxied the same way as any other backends
Handle asset caching (e.g. keep a local copy from backend response without need to install any web application to be installed locally only for static assets)
Gradually switch all Apache2 configuration

In case of static site, strip cookies, Pragma, etc

// requires HttpHeadersMore that’s part of nginx-extras
// ref: http://wiki.nginx.org/NginxHttpHeadersMoreModule#more_clear_headers
expires    2m;
add_header Cache-Control public;
more_clear_headers Set-Cookie Cache-Control Pragma;

Life of an HTTP request

This illustrates where an HTTP request goes and passes through in order to serve a response to the client.

nginx frontend

While Fastly encapsulates caching and load balancing, if an application needs to be served directly by us, we cannot scale unless we rework the configuration accordingly.

VirtualHosts

Each endpoints will have both an internal and a public virtual host configuration.

Priorities

[x] accounts.webplatform.org (served ~~directly~~, by frontends.webplatform.org)
[x] api.accounts.webplatform.org (served ~~directly~~, by frontends.webplatform.org)
[x] oauth.accounts.webplatform.org (served ~~directly~~, by frontends.webplatform.org)
[x] profile.accounts.webplatform.org (served ~~directly~~, by frontends.webplatform.org)
[x] stats.webplatform.org (served ~~directly~~, by frontends.webplatform.org)
- [x] Fastly uses same IP addresses
- [x] DNS has CNAME entry to Fastly
[x] notes.webplatform.org (served ~~directly~~, by frontends.webplatform.org)
[x] discuss.webplatform.org (served ~~directly~~, by frontends.webplatform.org)
- [x] Fastly uses same IP addresses
- [x] DNS has CNAME entry to Fastly
[x] specs.webplatform.org
- [x] Fastly uses same IP addresses
- [x] DNS has CNAME entry to Fastly
- [ ] Switch Fastly to use new production (!!!)
  Ref
http://tomayko.com/writings/things-caches-do
http://kly.no/posts/2010_01_26__Varnish_best_practices__.html
https://www.mnot.net/cache_docs/
https://moz.com/devblog/how-to-cache-http-range-requests/

renoirb commented 9 years ago

Don’t forget to update notes in https://docs.webplatform.org/wiki/WPD:Infrastructure/architecture/Things_to_consider_when_we_expose_service_via_Fastly_and_Varnish

renoirb commented 9 years ago

Ensure tracking.webplatform.org redirects to https://stats.webplatform.org/ (testthewebforward.org uses it)

renoirb commented 9 years ago

Refactor progress:

piwik (backend key: stats):
- [x] local
- [x] public
notes:
- [x] local
- [x] public
specs:
- [x] local
- [x] public
publican:
- [x] local
- [x] public
monitor (backend key: status):
- [x] local
- [x] public
accounts:
- [x] local
- [x] public
discuss:
- [x] local
- [x] public

renoirb commented 9 years ago

Proposed architecture

Each upstream service (i.e. the basic web server from a web application as a common denominator) has a set of nodes and is assigned a port number. A web application can run from a VM (like it originally was) but would now support if another VM runs a ~~Docker container~~ (anything that exposes an HTTP service).

Having a port ensures that we can separate which service run on an internal IP address, without relying on the Host: ... nor change an HTTP header, but it also separate the need to use a DNS server too. By doing so, it solve possible outages due to a reboot, or outdated information, and also speed up time to render the request by eliminating a DNS query.

A simple map stating which IP answers for the desired web application is then used to generate the configuration of the public frontend servers.

# Upstream pillar
upstream:
  notes:
    port: 8005
    nodes:
      - 10.10.10.157
# ...

A web application then has two virtual hosts, one for the internal network ("upstream"), one for the public ("frontend") server.

Frontend virtual host

# Generated automatically from the Upstream pillar
upstream upstream_hypothesis {
    server    10.10.10.157:8005;
    server    10.10.10.151:8005;
    server    10.10.10.17:8005;
}

server {
    listen      80;
    server_name notes.webplatform.org;
    include     common_params;
    return      301 https://notes.webplatform.org$request_uri;
}

server {
    listen      443 ssl spdy;
    server_name notes.webplatform.org;

    root    /var/www/html;
    include common_params;
    include ssl_params;

    # SSL configs...

    location / {
        proxy_pass http://upstream_hypothesis;
        include proxy_params;
        proxy_intercept_errors on;

        # WebSocket support plz
        proxy_http_version 1.1;
        proxy_set_header   Upgrade    $http_upgrade;
        proxy_set_header   Connection "upgrade";
    }
}

Upstream virtual host

server {
    listen  8005;

    root    /srv/webplatform/notes-server/notes_server/static;
    include common_params;

    rewrite ^/app/embed.js$ /annotator.js permanent;

    location = /annotator.js {
      rewrite    ^/annotator.js$  /embed.js last;
    }

    rewrite      ^/assets/notes-server/(.*)\? /$1;

    location / {
        proxy_pass   http://127.0.0.1:8001;
        include          proxy_params;

        # Since we are not using SSL internally, we have to force it here.
        # ref: https://docs.djangoproject.com/en/dev/ref/settings/#secure-proxy-ssl-header
        proxy_set_header X-Forwarded-Proto https;

        # Those seems to help with GUnicorn "Broken pipe" socket errors.
        # Assumption is that due to buffering because we have two NGINX servers
        # handles requests to GUnicorn; the internal (this one) and the public frontend.
        # Quoting GUnicorn docs: "If you want to be able to handle streaming request/responses or
        # other fancy features like Web sockets, you need to turn off the proxy buffering".
        # ref: http://gunicorn-docs.readthedocs.org/en/latest/deploy.html
        proxy_buffering off;

        # If we were to use a different backend hostname, we’d have to force it
        # like this.
        #proxy_set_header Host notes.webplatform.org;
    }
}

renoirb commented 9 years ago

Currently working on making names consistent. In states so far the concepts changed name many times from local, backend and upstream, but so the reference to a site (e.g. notes.webplatform.org) and the web application software running it (e.g. hypothesis). Sometimes a virtual host would run more than one web application. Creating confusion in the naming. This has to be handled too.

WebPlatformDocs commented 9 years ago

Everything went well, ready to merge.

renoirb commented 9 years ago

Note to self.

Forget about using internal name (i.e. notes.webplatform.org to be refered to as notes.backend.production.wpdn) let’s save in communication round trips and configure NGINX to know exactly which IP addresses can serve a web app. The NGINX upstream configuration will be generated as required by Salt stack.

renoirb commented 9 years ago

Sent in "Work status update" email on devrel list

webplatform / ops