The Workload Analyzer collects Presto® and Trino workload statistics, and analyzes them. The analysis provides improved visibility into your analytical workloads, and enables query optimization - to enhance cluster performance.
The Presto® Workload Analyzer collects, and stores, QueryInfo JSONs for queries executed while it is running, and any historical queries held in the Presto® Coordinator memory.
The collection process has negligible compute-costs, and does not impact cluster query execution in any way. Ensure that sufficient disk space is available in your working directory. Typically, a compressed JSON file size will be 50kb - 200kb.
The Workload Analyzer supports the following versions:
Although the Workload Analyzer may run with newer versions of Presto®, these scenarios have not been tested.
For installation, see here.
First, go to the analyzer
directory, where the Workload Analyzer Python code can be found.
cd analyzer/
To collect statistics from your cluster, run the following script for a period that will provide a representative sample of your workload.
./collect.py -c http://<presto-coordinator>:8080 --username-request-header "X-Trino-User" -o ./JSONs/ --loop
Notes:
To analyze the downloaded JSONs directory (e.g. ./JSONs/
) and generate a zipped HTML report, execute the following command:
./extract.py -i ./JSONs/ && ./analyze.py -i ./JSONs/summary.jsonl.gz -o ./output.zip
To collect statistics from your cluster, run the following script for a period that will provide a representative sample of your workload.
$ mkdir JSONs/
$ docker run -v $PWD/JSONs/:/app/JSONs analyzer ./analyzer/collect.py -c http://$PRESTO_COORDINATOR:8080 --username-request-header "X-Trino-User" -o JSONs/ --loop
To analyze the downloaded JSONs directory (e.g. ./JSONs/
), and generate a zipped HTML report, execute the following commands:
$ docker run -v $PWD/JSONs/:/app/JSONs analyzer ./analyzer/extract.py -i JSONs/
$ docker run -v $PWD/JSONs/:/app/JSONs analyzer ./analyzer/analyze.py -i JSONs/summary.jsonl.gz -o JSONs/output.zip
Notes:
See the following screencasts for usage examples:
To enable these requirements, the ./jsonl_process.py
script may be executed, after the ./extract.py
script, but before the ./analyze.py
script.
In the example below, only queries from the transactions
schema are kept, and the SQL queries are removed from the new summary file:
./jsonl_process.py -i ./JSONs/summary.jsonl.gz -o ./processed_summary.jsonl.gz --filter-schema transactions --remove-query
In the following example, all the schema names are obfuscated:
./jsonl_process.py -i ./JSONs/summary.jsonl.gz -o ./processed_summary.jsonl.gz --rename-schemas
In the following example, all the partition and user names are obfuscated:
./jsonl_process.py -i ./JSONs/summary.jsonl.gz -o ./processed_summary.jsonl.gz --rename-partitions --rename-user
After the ./jsonl_process.py
script has been executed, to generate a report based on the new summary file, run:
./analyze.py -i ./processed_summary.jsonl.gz -o ./output.zip
--high-contrast-mode
parameter, for example:
./analyze.py --high-contrast-mode -i ./JSONs/summary.jsonl.gz -o ./output.zip
Presto® is a trademark of The Linux Foundation.