postgres-ai / postgres-checkup

Postgres Health Check and SQL Performance Analysis. :point_right: THIS IS A MIRROR OF https://gitlab.com/postgres-ai/postgres-checkup
GNU Affero General Public License v3.0
169 stars 14 forks source link
healthchecks postgres postgres-checkup postgresql sql

Please support the project giving a GitLab star (it's on the main page, at the upper right corner):

Add a star

Demo: an example of postgres-checkup report (based on CI, multi node).

Disclaimer: Conclusions, Recommendations – work in progress. To treat the data correctly, you need deep Postgres knowledge. Each report consists of 3 sections: Observations, Conclusions, and Recommendations. Observations are filled automatically. As for Conclusions and Recommendations sections, not all reports are auto-generated.

About

Postgres Checkup (postgres-checkup) is a new kind of diagnostics tool for a deep analysis of a Postgres database health. It detects current and potential issues with database performance, scalability and security. It also produces recommendations on how to resolve or prevent them.

A monitoring system will only show current, urgent problems. And postgres-checkup will show sneaking up, deeper problems, that may hit you in the future. It helps to solve many known database administration problems and common pitfalls. It aims to detect issues at a very early stage and to suggest the best ways to prevent them. We recommend to run these on a regular basis — weekly, monthly, and quarterly. And also to run these right before and after applying any major change to a database server. Whether it’s a schema or configuration parameter or cluster settings change.

Why do you need postgres-checkup and why it's safe and easy to use:

Reports Structure

Postgres-checkup produces two kinds of reports for every check:

Markdown reports can be converted to different formats such as HTML or PDF.

Each report consists of three sections:

  1. "Observations": automatically collected data. This is to be consumed by an expert DBA.
  2. "Conclusions": what we conclude from the Observations, stated in plain English in the form that is convenient for engineers who are not DBA experts.
  3. "Recommendations": action items, what to do to fix the discovered issues.

Both "Conclusions" and "Recommendations" are to be consumed by engineers who will make decisions what, how and when to optimize.

Installation and Usage

Requirements

For the operator machine (from where the tool will be executed), the following OS are supported:

There are known cases when postgres-checkup was successfully used on Windows, althought with some limitations.

The following programs must be installed on the operator machine:

pandoc and wkhtmltopdf are optional, they are neededed for generating HTML and PDF versions of report (options --html, --pdf).

Nothing special has to be installed on the observed machines. However, they must run Linux (again: modern RHEL/CentOS or Debian/Ubuntu; others should work as well, but are not yet tested).

:warning: Only Postgres version 9.6 and higher are currently officially supported.

How to Install

1. Install required programs

Ubuntu/Debian:

sudo apt-get update -y
sudo apt-get install -y git postgresql coreutils jq golang

# Optional (to generate PDF/HTML reports)
sudo apt-get install -y pandoc
wget https://github.com/wkhtmltopdf/wkhtmltopdf/releases/download/0.12.4/wkhtmltox-0.12.4_linux-generic-amd64.tar.xz
tar xvf wkhtmltox-0.12.4_linux-generic-amd64.tar.xz
sudo mv wkhtmltox/bin/wkhtmlto* /usr/local/bin
sudo apt-get install -y openssl libssl-dev libxrender-dev libx11-dev libxext-dev libfontconfig1-dev libfreetype6-dev fontconfig

CentOS/RHEL:

sudo yum install -y git postgresql coreutils jq golang

# Optional (to generate PDF/HTML reports)
sudo yum install -y pandoc
wget https://github.com/wkhtmltopdf/wkhtmltopdf/releases/download/0.12.4/wkhtmltox-0.12.4_linux-generic-amd64.tar.xz
tar xvf wkhtmltox-0.12.4_linux-generic-amd64.tar.xz
sudo mv wkhtmltox/bin/wkhtmlto* /usr/local/bin
sudo yum install -y libpng libjpeg openssl icu libX11 libXext libXrender xorg-x11-fonts-Type1 xorg-x11-fonts-75dpi

MacOS (assuming that Homebrew is installed):

brew install postgresql coreutils jq golang git

# Optional (to generate PDF/HTML reports)
brew install pandoc Caskroom/cask/wkhtmltopdf

2. Clone this repo

git clone https://gitlab.com/postgres-ai/postgres-checkup.git
# Use --branch to use specific release version. For example, to use version 1.1:
#   git clone --branch 1.1 https://gitlab.com/postgres-ai/postgres-checkup.git
cd postgres-checkup

3. Build pghrep

cd ./pghrep
make install main
cd ..

Example of Use

Let's make a report for a project named prod1. Assume that we have two servers, db1.vpn.local and db2.vpn.local.

Postgres-checkup automatically detects which one is the master:

./checkup -h db1.vpn.local -p 5432 --username postgres --dbname postgres --project prod1 -e 1
./checkup -h db2.vpn.local -p 5432 --username postgres --dbname postgres --project prod1 -e 1

Which literally means: connect to the server with given credentials, save data into prod1 project directory, as epoch of check 1. Epoch is a numerical (integer) sign of current iteration. For example: in half a year we can switch to "epoch number 2".

-h db2.vpn.local means: try to connect to host via SSH and then use remote psql command to perform checks. If SSH is not available the local 'psql' will be used (non-psql reports will be skipped) to establish Postgres connection. If you want to avoid "guessing", use -ssh-hostname or --pg-hostname.

Also, you can define a specific way to connect: SSH or psql:

--ssh-hostname db2.vpn.local - SSH will be used for the connection. SSH port can be defined as well with option --ssh-port.

--pg-hostname db2.vpn.local - psql will be used for the connection. The port where PostgreSQL accepts connections can be defined with the option --pg-port.

In case when --pg-port or --ssh-port are not defined but --port is defined, value of --port option will be used instead of --pg-port or --ssh-port depending on the current connection type.

For comprehensive analysis, it is recommended to run the tool on the master and all its replicas – postgres-checkup is able to combine all the information from multiple nodes to a single report.

Some reports (such as K003) require two snapshots, to calculate "deltas" of metrics. So, for better results, use the following example, executing it during peak working hours, with $DISTANCE values from 10 min to a few hours:

$DISTANCE="1800" # 30 minutes

# Assuming that db2 is the master, db3 and db4 are its replicas
for host in db2.vpn.local db3.vpn.local db4.vpn.local; do
  ./checkup \
    -h "$host" \
    -p 5432 \
    --username postgres \
    --dbname postgres \
    --project prod1 \
    -e 1 \
    --file resources/checks/K000_query_analysis.sh # the first snapshot is needed only for reports K***
done

sleep "$DISTANCE"

for host in db2.vpn.local db3.vpn.local db4.vpn.local; do
  ./checkup \
    -h "$host" \
    -p 5432 \
    --username postgres \
    --dbname postgres \
    --project prod1 \
    -e 1
done

As a result of execution, two directories containing .json and .md files will be created:

./artifacts/prod1/json_reports/1_2018_12_06T14_12_36_+0300/
./artifacts/prod1/md_reports/1_2018_12_06T14_12_36_+0300/

Each of generated files contains information about "what we check" and collected data for all instances of the postgres cluster prod1.

A human-readable report can be found at:

./artifacts/prod1/md_reports/1_2018_12_06T14_12_36_+0300/Full_report.md

Open it with your favorite Markdown files viewer or just upload to a service such as gist.github.com.

You can collect and process data separately by specifying working mode name in CLI option --mode %mode% or using it as a "command" (checkup %mode%).
Available working modes:
collect - collect data; process - generate MD (and, optionally, HTML, PDF) reports with conclusions and recommendations; upload - upload generated reports to Postgres.ai platform; run - collect and process data at once. This is the default mode, it is used when no other mode is specified. Note, that upload is not included.

Docker 🐳

It's possible to use the postgres-checkup from a docker container. The container will run, execute all checks and stop itself. The check result can be found inside the artifacts folder in current directory (pwd).

Usage with docker run

There is an option to run postgres-checkup in a Docker container:

docker run --rm \
  --name postgres-checkup \
  --env PGPASSWORD="postgres" \
  --volume `pwd`/artifacts:/artifacts \
  postgresai/postgres-checkup:latest \
    ./checkup \
      --hostname hostname \
      --port 5432 \
      --username postgres \
      --dbname postgres \
      --project c \
      --epoch "$(date +'%Y%m%d')001"

In this case some checks (those requiring SSH connection) will be skipped.

If you want to have all supported checks, you have to use SSH access to the target machine with Postgres database.

If SSH connection to the Postgres server is available, it is possible to pass SSH keys to the docker container, so postgres-checkup will switch to working via remote SSH calls, generating all reports (this approach is known to have issues on Windows, but should work well on Linux and MacOS machines):

docker run --rm \
  --name postgres-checkup \
  --volume "$(pwd)/artifacts:/artifacts" \
  --volume "$(echo ~)/.ssh/id_rsa:/root/.ssh/id_rsa:ro" \
  postgresai/postgres-checkup:latest \
  ./checkup \
    --hostname sshusername@hostname \
    --username my_postgres_user \
    --dbname my_postgres_database \
    --project docker_test_with_ssh \
    --epoch "$(date +'%Y%m%d')001"

If you try to check the local instance of postgres on your host from a container, you cannot use localhost in -h parameter. You have to use a bridge between host OS and Docker Engine. By default, host IP is 172.17.0.1 in docker0 network, but it vary depending on configuration. More information here.

If you use SSH connection and sudo on the remote server requires a password, you can provide this password using the SSHSUDOPASSWORD environment variable.

Credits

Some reports are based on or inspired by useful queries created and improved by various developers, including but not limited to:

Docker support implemented by Ivan Muratov.

The Full List of Reports

А. General / Infrastructural

B. Backups and DR

C. Replication and HA

D. Monitoring / Troubleshooting

E. WAL, Checkpoints

F. Autovacuum, Bloat

G. Performance / Connections / Memory-related Settings

H. Index Analysis

J. Capacity Planning

K. SQL query Analysis

L. DB Schema Analysis