zhouqingqing / qpmodel

A Relational Optimizer and Executor
MIT License
66 stars 18 forks source link

accelerate one time big data set query #299

Open zhouqingqing opened 3 years ago

zhouqingqing commented 3 years ago

The reason we need foreign table scan is to accelerate debug queries against big data set. Currently we have to load the whole data set into memory, collect stats, then run the query. This is slow when the data set is big (but good for batch of queries run).

To solve this problem, we need the following:

  1. DDL to persists/read back stats: basic function is already there. See statis.cs.
  2. support feign table with syntax like this:
    CREATE FOREIGN TABLE A(i int)
        OPTIONS ( filename 'data/data1.csv', format 'csv' );

    Note that we have PhysicScanFile can read from csv.

With above, we can:

  1. One time to load data set, collect stats and persists stats.
  2. Whenever you query use foreign table, you can load stats and read csv directly.