ironsmile / nedomi

Highly performant HTTP reverse proxy with efficient caching of big files
GNU General Public License v3.0
81 stars 9 forks source link
caching http-server reverse-proxy

nedomi

Build Status Coverage Status GoDoc Go Report Card

HTTP media cache server. Most caching servers do not understand when a media file is being proxied. When storing media files you can gain performance and space when you consider the fact that most of the big meda files are not actually watched from end to end.

nedomi implements caching algorithms which take all this into consideration and deliver better cache performance and throughput.

Contents

Algorithms

nedomi is designed so that we can change the way it works. For every major part of its internals it uses interfaces. This will hopefully make it easier when swapping algorithms.

The most important one is the caching algorithm. At the moment nedomi has only one implemented and it is segmented LRU. It is inspired by Varnish's idea. The big thing that makes it even better for nedomi is that our objects always have exactly the same size. We do not keep whole files in the cache but evenly sized parts of the files. This effectively means that the implementation of the cache evictions and insertions is extremely simple. It will be as easy to deal with storage fragmentation if we ever need to.

We keep track of file chunks separately. This means chunks that are not actually watched are not stored in the cache. Our observations in the real world show that when consuming digital media people more often than not skip parts and jump from place to place. Storing unwatched gigabytes does not make sense. And this is the real benefit of our chunked storage. It stores only the popular parts of the files which leads to better cache performance.

Requirements

Nothing. It is pure Go. If you have some of latest versions of the language you are good to go.

Install

Nothing fancy. Just go get.

go get github.com/ironsmile/nedomi

At the moment this is the only way to install the software. In the future (when it gets more stable) we may start to distribute binary packages.

Configuration

Notice! This section is slightly out of date at this point. It will be cleaned up once the alpha-1 milestone is reached. While mostly true not all of the things written here are correct. You are advised to look at the config.example.json file where everything should be up to date.

We all know that a mark of a good software is its configurability. Not everyone has the same needs. So we have you covered. nedomi lets you choose all the details.

The configuration is stored in a single JSON file. It is basically an object with a key for every single section. In it you find the concepts of cache zone and virtual host.

nedomi supports many virtual hosts and many cache zones. Every virtual host stores its cache in a single cache zone. But there may be many virtual hosts which store their cache in one zone.

The main sections of the config look like this.

{
    "id": "nedomi01",
    "system": {/*...*/},
    "cache_zones": [/*...*/],
    "http": {/*...*/},
    "logger": {/*...*/}
}

Descriptions of all keys and their values types follows.

If you feel want to just skip to the full example - here is the place for your click. For everyone else a description of all configuration options follows.

HTTP Config

Here you can find all the HTTP-related configurations. The basic config looks like this:

{
    "listen": ":8282",
    "max_headers_size": 1231241212,
    "read_timeout": 12312310,
    "write_timeout": 213412314,
    "max_io_transfer_size": "1m",
    "min_io_transfer_size": "128k",
    "virtual_hosts": [/*...*/],
}

Description of all the keys and their meaning:

Cache Zones

Our Cache zones are very similar to the nginx' cache zones in that they represent bounded space on the storage for a cache. If files stored in this space exceeds its limitations the worst (caching-wise) files will be removed to get it back to the desired limits.

Example cache zone:

{
    "id": 2,
    "path": "/home/iron4o/playfield/nedomi/cache2",
    "storage_objects": 4723123,
    "part_size": "4m",
    "cache_algorithm": "lru",
    "skip_cache_key_in_path": true
}

Virtual Hosts

Virtual hosts are something familiar if you are coming form apache. In nginx they are called servers. Basically you can have different behaviours depending on the Host header sent to your server.

Example virtual host:

{
    "name": "proxied.example.com",
    "upstream": "http://example.com",
    "cache_zone": 2,
    "cache_key": "2.1"
}

System

All keys are:

{
    "pidfile": "/tmp/nedomi_pidfile.pid",
    "workdir": "/",
    "user": "www-data"
}

Logging

At the moment nedomi supports a single log file. All errors, notices (and even the access log) go in it. When the -D command line flag is used all output is send to the stdout.

{
    "log_file": "/var/log/nedomi.log",
    "debug": false
}

Full Config Example

See the config.example.json in the repo. It is here. It has all the possible keys and example values.

Status Page

nedomi comes with a HTTP status page. It is a pretty html with information about the internals and performance of the running server. After you've determined the host (let's say 127.0.0.2) you want to use for the status page, add the following record to the virtual_hosts section of the http configuration to enable the status page on that address:

{
    "name": "127.0.0.2",
    "handler": "status"
}

Benchmarks

Measuring performance with benchmarks is a hard job. We've tried to do it as best as possible. We used mainly wrk for our benchmarks. Included in the repo is one of our best scripts and few results form running it at various stages of the development.

This benchmark script tries to behave like a real users watching videos. It seeks from place to place, it likes some videos more than others. Also, it is more likely to watch the beginning of the video.

At the moment our measurements show that nedomi is comparable or slightly better than nginx in the tested work loads. We expect much more performance after code optimization which nedomi haven't had to this moment.

Limitations

Needless to say this is a young project. There are things which we know are required from any good cache server but we simple haven't time to put in.

Listening addresses - At the moment nedomi only listens to one address. All virtual hosts must use it.

Upstreams - Only one upstream is supported for a virtual host.

Cache Loading - For the moment nedomi will not load its cache from the disk after restart.

HTTPS is not supported.

Extending It

nedomi is modular. You can add or remove modules from it as much as you want. We've tried to abstarct away every possible thing into an interface. Extending normally is done via writing a package and adding it into its proper place in the source tree and then running go generate ./.... For specific instructions you should see the particular README for every type of modules which can be written. They are:

Credits

We've used videos from OpenFest lectures for our benchmarks.

There are 2 hard problems in computer science: caching, naming and off-by-1 errors - Tim Bray quoting Phil Karlton and extended by the Internet