python implementation of "Rethinking Learned Cost Models: Why Start from Scratch?"
Module | Description |
---|---|
model | Includes how ParamTree to split nodes and how to use ParamTree to predict the execution time of physical plans. |
feature | Includes how to extract c-params from queries and databases. Also includes some data used in the rule for calculating cost. |
experiments | The code includes experiments for conducting passive and active learning. |
recommendation | Code for Section4.2, which recommand c-param for the next split candidate |
query_gen | Code for Section4.3, which generate queries from workload queries |
Our method can be trained in two ways:
Before running, please fill in the relevant information of the database for connection in \feature\info.py. As clearing the cache of the operating system is required during runtime, a Linux account with root privileges also needs to be provided.
Example:
db_info = {'server':"127.0.0.1",
'pg':{
'username':"postgres",
'password':"postgres",
'port':5434,
'command_ctrl':"docker exec -it --user postgres database /home/usr/pgsql13.1/bin/pg_ctl -D /home/usr/pgsql13.1_data"},
'ssh' : {
'username':"root",
'password':"root",
'port':22
}
}
It is necessary to create the corresponding database on PostgreSQL and import the data. Some information still needs to be retrieved from the database, such as innerbucketsize in Hash Join.
python3 main.py --mode NORMAL_TRAIN --train_data ./data/experiment/imdb_synthetic.txt --test_data ./data/experiment/imdb_job-light.txt --db imdb --save_model_name imdb_synthetic_temp --load_model_name imdb_synthetic_temp --leaf_num 20
python3 main.py --mode TEST --load_model --workload ./data/experiment/imdb_job-light.txt --db imdb --load_model_name imdb_synthetic_temp
python3 main.py --mode TEST --load_model --workload ./data/experiment/imdb_scale.txt --db imdb --load_model_name imdb_synthetic_temp
python3 main.py --mode AL_TRAIN --workload ./data/experiment/tpcds_test.txt --db tpcds --save_model_name tpcds_actively --qerror_threshold 1.1 --sample_num_per_expansion 80
python3 main.py --mode TEST --load_model --workload ./data/experiment/tpcds_test.txt --db tpcds --load_model_name tpcds_actively_last