waltzofpearls / reckon

Experimental prometheus exporter for time series forecasting and anomaly detection
MIT License
17 stars 3 forks source link
anomaly-detection forecasting go metrics prometheus-exporter time-series

reckon

Build Status MIT licensed

Reckon is a prometheus exporter for time series forecasting and anomaly detection. It scrapes prometheus metrics with a defined time range, train a predictive model with those metrics, and then expose the original and forecasted metrics back through a prometheus HTTP endpoint.

An exmaple of original and forecasted metrics generated from Prophet and Tangram gradient boosting model and then exposed from reckon:

# HELP sensehat_humidity_prophet Prophet forecasted metric value
# TYPE sensehat_humidity_prophet gauge
sensehat_humidity_prophet{column="yhat",instance="sensehat.rpi.topbass.studio:8000",job="sensehat_exporter"} 58.246349893997056
sensehat_humidity_prophet{column="yhat_lower",instance="sensehat.rpi.topbass.studio:8000",job="sensehat_exporter"} 57.4450973695544
sensehat_humidity_prophet{column="yhat_upper",instance="sensehat.rpi.topbass.studio:8000",job="sensehat_exporter"} 59.08650789641951
# HELP sensehat_temperature_prophet Prophet forecasted metric value
# TYPE sensehat_temperature_prophet gauge
sensehat_temperature_prophet{column="yhat",instance="sensehat.rpi.topbass.studio:8000",job="sensehat_exporter"} 22.196715588009724
sensehat_temperature_prophet{column="yhat_lower",instance="sensehat.rpi.topbass.studio:8000",job="sensehat_exporter"} 21.83335916260913
sensehat_temperature_prophet{column="yhat_upper",instance="sensehat.rpi.topbass.studio:8000",job="sensehat_exporter"} 22.551863453282152
# HELP sensehat_humidity_tangram Tangram forecasted metric value
# TYPE sensehat_humidity_tangram gauge
sensehat_humidity_tangram{column="yhat",instance="sensehat.rpi.topbass.studio:8000",job="sensehat_exporter"} 58.43156051635742
sensehat_humidity_tangram{column="yhat_lower",instance="sensehat.rpi.topbass.studio:8000",job="sensehat_exporter"} 54.96922722581767
sensehat_humidity_tangram{column="yhat_upper",instance="sensehat.rpi.topbass.studio:8000",job="sensehat_exporter"} 61.893893806897175
# HELP sensehat_temperature_tangram Tangram forecasted metric value
# TYPE sensehat_temperature_tangram gauge
sensehat_temperature_tangram{column="yhat",instance="sensehat.rpi.topbass.studio:8000",job="sensehat_exporter"} 21.94704246520996
sensehat_temperature_tangram{column="yhat_lower",instance="sensehat.rpi.topbass.studio:8000",job="sensehat_exporter"} 20.179807771933568
sensehat_temperature_tangram{column="yhat_upper",instance="sensehat.rpi.topbass.studio:8000",job="sensehat_exporter"} 23.714277158486354

Each metric configured in reckon will have its forecasted metric generated and exported in 3 dimensions (labels):

reckon+grafana

Try it

Gather the following info before start:

With Docker

This is the simplest method to get reckon running. You only need docker to get started.

PROM_CLIENT_URL={prometheus_server_address} \
WATCH_LIST={inline_yaml_or_comma_separated_list} \
make docker

With Go, Python and Virtualenv

If you prefer running reckon without docker, or you would like to build and run the binary locally, you will need Go, Python and Virtualenv. Make sure you have Go 1.16+ and Python 3.7

PROM_CLIENT_URL={prometheus_server_address} \
WATCH_LIST={inline_yaml_or_comma_separated_list} \
make run

Configure it

Reckon can be configured with the following environment variables:

Environment variable Required? Default value Description
SCHEDULE Yes @every 120m schedule for model training
TIMEZONE Yes America/Vancouver timezone for calculating schedule
WATCH_LIST Yes metrics to scrape, YAML (model names specified on each metric) or comma separated list (needs MODELS)
MODELS Yes Prophet ML models for training and forecasting, required if WATCH_LIST is a comma separated list
PROM_EXPORTER_ADDR Yes :9999 address for reckon to expose forecasted metrics
PROM_CLIENT_URL Yes reckon will scrape metrics from this URL
PROM_CLIENT_TLS_CA No CA cert if PROM_CLIENT_URL is https
PROM_CLIENT_TLS_CERT No TLS cert if PROM_CLIENT_URL is https
PROM_CLIENT_TLS_KEY No TLS key if PROM_CLIENT_URL is https
PROM_CLIENT_INSECURE_SKIP_VERIFY No skip TLS verification on PROM_CLIENT_URL
DEFAULT_CHUNK_SIZE Yes 120m duration of original data to scrape
ROLLING_WINDOW Yes 72h duration of original data to keep for training
GRPC_SERVER_ADDRESS Yes localhost:18443 internal gRPC model server address
GRPC_ROOT_CA Yes gRPC mTLS root CA
GRPC_SERVER_CERT Yes gRPC mTLS server cert
GRPC_SERVER_KEY Yes gRPC mTLS server cert key
GRPC_CLIENT_CERT Yes gRPC mTLS client cert
GRPC_CLIENT_KEY Yes gRPC mTLS client cert key
GPRC_SERVER_PROCESS_NUM Yes num of logical CPUs gRPC server - number of subprocesses
GRPC_SERVER_THREAD_NUM Yes 1 gRPC server - number of threads on each subprocess
GRPC_SERVER_GRACE_PERIOD_IN_SECS Yes 2 gRPC server - in shutdown grace period, server only sends SIGTERM to attempt shutting down subprocesses
GRPC_SERVER_KILL_PERIOD_IN_SECS Yes 5 gRPC server - shutdown kill period, server force kill any subprocesses that are still alive

WATCH_LIST accepts either inline YAML or comma separated list, for example:

Available models:

Use it

Monitor it

In addition to forecasted metrics, reckon also exposes runtime metrics to help monitor reckon itself.

Metric Type Description
reckon_prometheus_client_scrape_time_seconds Gauge timestamp of the last prometheus client scrape
reckon_exporter_scraped_time_seconds Gauge timestamp of the last time reckon exporter scraped by prometheus
reckon_model_train_time_seconds Gauge timestamp of the last reckon model training
reckon_forecast_data_received_time_seconds Gauge timestamp of the last time receiving forecast data
reckon_model_train_total Counter number of times calling model train
reckon_model_train_duration_seconds Gauge time taken in seconds from the last model training
reckon_model_train_errors_total Counter number of model training errors
reckon_prometheus_client_scrape_errors_total Counter prometheus client scraping errors
reckon_exporter_scraped_total Counter number of times reckon exporter scraped by prometheus
reckon_prometheus_client_scrape_total Counter number of prometheus client scrape
reckon_data_scraped_duration_minutes Gauge duration of data scraped from prometheus
reckon_forecast_data_duration_minutes Gauge duration of data being kept in memory
reckon_training_data_duration_minutes Gauge duration of data sent to model for training
reckon_data_scraped_values Gauge number of the last scraped data points
reckon_forecast_data_values Gauge number of the existing data points kept in memory
reckon_training_data_values Gauge number of data points recently sent to model for training

Reckon exposes build info through reckon_build_info

# HELP reckon_build_info Information about reckon build.
# TYPE reckon_build_info gauge
reckon_build_info{commit="c5159d0375d59b0255069eeec3fe3d3ccfe42207",date="2021-11-24T08:05:34Z",go_version="1.16.10",goreleaser_version="0.174.0",python_version="3.8.12",version="0.5.2"} 1