yplusone / ParamTree

The core code for ParamTree, introduced in paper "Rethinking Learned Cost Models: Why Start from Scratch?"
5 stars 1 forks source link

Rethink of a Learned Cost Model: Why Start from Scratch?

python implementation of "Rethinking Learned Cost Models: Why Start from Scratch?"

Requirements

Main Modules

Module Description
model Includes how ParamTree to split nodes and how to use ParamTree to predict the execution time of physical plans.
feature Includes how to extract c-params from queries and databases. Also includes some data used in the rule for calculating cost.
experiments The code includes experiments for conducting passive and active learning.
recommendation Code for Section4.2, which recommand c-param for the next split candidate
query_gen Code for Section4.3, which generate queries from workload queries

Run

Our method can be trained in two ways:

Information

Before running, please fill in the relevant information of the database for connection in \feature\info.py. As clearing the cache of the operating system is required during runtime, a Linux account with root privileges also needs to be provided.

Example:

db_info = {'server':"127.0.0.1",
            'pg':{
                  'username':"postgres",
                 'password':"postgres",
                 'port':5434,
                 'command_ctrl':"docker exec -it  --user postgres database /home/usr/pgsql13.1/bin/pg_ctl -D /home/usr/pgsql13.1_data"},
            'ssh' : {
                   'username':"root",
                 'password':"root",
                 'port':22
            }
        }

It is necessary to create the corresponding database on PostgreSQL and import the data. Some information still needs to be retrieved from the database, such as innerbucketsize in Hash Join.

Passive

Train model

python3 main.py --mode NORMAL_TRAIN --train_data ./data/experiment/imdb_synthetic.txt --test_data ./data/experiment/imdb_job-light.txt --db imdb --save_model_name imdb_synthetic_temp --load_model_name imdb_synthetic_temp --leaf_num 20

Test model

python3 main.py --mode TEST --load_model --workload ./data/experiment/imdb_job-light.txt --db imdb --load_model_name imdb_synthetic_temp

python3 main.py --mode TEST --load_model --workload ./data/experiment/imdb_scale.txt --db imdb --load_model_name imdb_synthetic_temp

Active

Train model

python3 main.py --mode AL_TRAIN --workload ./data/experiment/tpcds_test.txt --db tpcds --save_model_name tpcds_actively --qerror_threshold 1.1 --sample_num_per_expansion 80

Test model

python3 main.py --mode TEST --load_model --workload ./data/experiment/tpcds_test.txt --db tpcds --load_model_name tpcds_actively_last