whitewater-guide / gorge

MIT License
2 stars 3 forks source link

Gorge

Release

Gorge is a service which harvests hydrological data (river's discharge and water level) on schedule. Harvested data is stored in database and can be queried later.

Table of contents

Why should I use it?

This project is mainly intended for whitewater enthusiasts. Currently, there are several projects that harvest and/or publish hydrological data for kayakers and other river folks. There's certain level of duplication, because these projects harvest data from the same sources. So, if you have a project and want to add new data source(s) to it, you have 3 choices:

  1. Write parser/harvester yourself and harvest data yourself
  2. Reuse parser/harvester from another project, but harvest data yourself
  3. Cooperate with another project to reduce load on the original data source

So how can gorge/whitewater.guide help you? Currently, you can harvest data from whitewater.guide (which uses gorge internally to publish it). It's available via our GRAPHQL endpoint. Please respect the original data licenses. This is option 3.

If you prefer option 2, you can run gorge server in docker container and use our scripts to harvest data, so you don't have to write them yourself.

Gorge was designed with 2 more features in mind. These features are not implemented yet, but they should not take long for us to implement in case someone would like to use them:

Data sources

You can find the list of our data sources and their statuses here

Usage

Gorge is distributed as a ~130Mb docker image with two binary files:

Setting up database

Gorge database schemas for postgres and sqlite can be found here.

In postgres table measurements is partitioned. Make sure you have pg_partman extension installed. Managing partitions is your responsibility. We use run partman.run_maintenance_proc with pg_cron (because AWS RDS doesn't yet support partman_bgw yet); Also we use dump_partitions.py script from partman.

Gorge is compatible with TimescaleDB extension. To use it, run following query while measurements table is still empty.

SELECT create_hypertable('measurements', 'timestamp');

Launching

gorge-server accepts configuration via cli arguments (use gorge-server --help). You can pass them via docker-compose command field, like this:

command:
  [
    "--pg-db",
    "gorge",
    "--debug",
    "--log-format",
    "plain",
    "--db-chunk-size",
    "1000",
  ]

Here is the list of available flags:

--cache string                   either 'inmemory' or 'redis' (default "redis")
--db string                      either 'inmemory' or 'postgres' (default "postgres")
--db-chunk-size int              measurements will be saved to db in chunks of this size. When set to 0, they will be saved in one chunk, which can cause errors
--debug                          enables debug mode, sets log level to debug
--endpoint string                endpoint path (default "/")
--hooks-health-cron string       cron expression for running health notifier (default "0 0 * * *")
--hooks-health-headers strings   headers to set on request, in 'Header: Value' format, similar to curl  (default [])
--hooks-health-threshold int     hours required to pass since last successful execution to consider job unhealthy (default 48)
--hooks-health-url string        external endpoint to call with list of unhealthy jobs
--http-proxy string              HTTP client proxy (for example, you can use mitm for local development)
--http-timeout int               Request timeout in seconds (default 60)
--http-user-agent string         User agent for requests sent from scripts. Leave empty to use fake browser agent (default "whitewater.guide robot")
--http-without-tls               Disable TLS for some gauges
--log-format string              set this to 'json' to output log in json (default "json")
--log-level string               log level. Leave empty to discard logs (default "info")
--pg-db string                   postgres database (default "postgres")
--pg-host string                 postgres host (default "db")
--pg-password string             postgres password [env POSTGRES_PASSWORD]
--pg-user string                 postgres user (default "postgres")
--port string                    port (default "7080")
--redis-host string              redis host (default "redis")
--redis-port string              redis port (default "6379")

Gorge uses database to store harvested measurements and scheduled jobs. It comes with postgres and sqlite drivers. Gorge will initialize all the required tables. Check out sql migration file if you're curious about db schema.

Gorge uses cache to store safe-to-lose data: latest measurement from each gauge and harvest statuses. It comes with redis (recommended) and embedded redis drivers.

Gorge server is supposed to be running in private network. It doesn't support HTTPS. If you want to expose it to public, use reverse proxy.

Working with API

Below is the list of endpoints exposed by gorge server. You can use request.http files in project root and script directories to play with running server.

Available scripts

List of available scripts is here

Health notifications

Gorge can call your webhooks when some of the running scripts haven't harvested any data for a period of time.

To configure healthcheck, use --health--xxx cli arguments. For example:

command:
  [

    '--hooks-health-cron',
    '0 0 * * *', # check health every midnight

    '--hooks-health-threshold',
    '48', # scripts that haven't harvested anything within last 48 hours are considered unhealthy

    '--hooks-health-url',
    'http://host.docker.internal:3333/gorge/health', # so POST request will be made to this endpoint

    '--hooks-health-headers',
    'x-api-key: __test_gorge_health_key__', # multiple headers can be set on this request
  ]

Example of this POST request payload:

[
    {
        "id": "2f915d20-ffe6-11e8-8919-9f370230d1ae",
        "script": "chile",
        "lastRun": "2021-12-13T07:57:59Z"
    },
    {
        "id": "e3c0c89a-7c72-11e9-8abd-cfc3ab2b843d",
        "script": "quebec",
        "lastRun": "2021-12-13T07:57:00Z",
        "lastSuccess": "2021-12-10T09:22:00Z"
    }
]

Other

There're Typescript type definitions for the API available on NPM

Development

Inside container

Preferred way of development is to develop inside docker container. I do this in VS Code. The repo already contains .devcontainer configuration.

If you use docker-compose.yml you need .env.development file where you can put env variables with secrets for scripts. The app will work without those variables, but docker-compose requires .env.development file to be present. If you use VS Code, .devcontainer takes care of this.

Some tests require postgres. You cannot run them inside docker container (unless you want to mess with docker-inside-docker). They're excluded from main test set, I run them using make test-nodocker from host machine or CI environment.

Docker-compose stack comes with mitmproxy. You can monitor your development server requests at http://localhost:6081 on host machine.

On host machine

If you want to develop on host machine, you'll need following libraries installed on it (they're installed in docker image, see Dockerfile for more info):

Also you'll need following go tools:

These tools are installed locally (see tools.go), but you should make sure that binaries are in your PATH

Building and running

Take a look at Makefile. Here are the highlights:

Writing scripts

Here are some recommendations for writing scripts for new sources

TODO

License

MIT