sundy-li / databend

FuseQuery is a Distributed SQL Query Engine at scale
Other
2 stars 0 forks source link

bug: 'create table t as select ....' #87

Open sundy-li opened 2 months ago

sundy-li commented 2 months ago

Search before asking

Version

936ffc9

What's Wrong?

the pipeline is not distributed, and explain sql will do "create table t4".

How to Reproduce?

mysql> explain pipeline  create table t4 as select * from target_table;
+--------------------------------------------------------------------------------------------+
| explain                                                                                    |
+--------------------------------------------------------------------------------------------+
| CommitSink × 1 processor                                                                   |
|   MutationAggregator × 1 processor                                                         |
|     TransformSerializeSegment × 1 processor                                                |
|       Merge (TransformSerializeBlock × 10 processors) to (TransformSerializeSegment × 1)   |
|         TransformSerializeBlock × 10 processors                                            |
|           BlockCompactTransform × 10 processors                                            |
|             CompoundBlockOperator(Project) × 10 processors                                 |
|               DeserializeDataTransform × 10 processors                                     |
|                 SyncReadParquetDataSource × 10 processors                                  |
+--------------------------------------------------------------------------------------------+
9 rows in set (0.10 sec)
Read 0 rows, 0.00 B in 0.047 sec., 0 rows/sec., 0.00 B/sec.

mysql> show tables;
+-------------------+
| Tables_in_default |
+-------------------+
| source_table      |
| t3                |
| t4                |
| target_table      |
| tmp               |
+-------------------+
5 rows in set (0.05 sec)
Read 5 rows, 1.00 KiB in 0.035 sec., 142.36 rows/sec., 28.56 KiB/sec.

Are you willing to submit PR?

dosubot[bot] commented 2 months ago

The pipeline is not distributed when executing create table t4 as select * from target_table in version 936ffc9 because the create_table_as_select method in CreateTableInterpreter does not include any logic to distribute the pipeline. The method constructs the pipeline using InsertInterpreter, but there is no mention of distributing the pipeline across multiple nodes or workers.

Here is the relevant code snippet from create_table_as_select:

let mut pipeline = InsertInterpreter::try_create(self.ctx.clone(), insert_plan)?
    .execute2()
    .await?;

This indicates that the pipeline is created and executed, but there is no distribution logic applied to it.

Additionally, the PipelineBuilder class has a method build_distributed_insert_select that is designed to handle distributed insert-select operations. However, this method is not invoked in the create_table_as_select process. The build_distributed_insert_select method includes logic to build a distributed pipeline, cast schemas if needed, and append data to the table in a distributed manner:

impl PipelineBuilder {
    pub fn build_distributed_insert_select(
        &mut self,
        insert_select: &DistributedInsertSelect,
    ) -> Result<()> {
        // ... method implementation ...
    }
}

Since the create_table_as_select method does not utilize this distributed pipeline builder, the operation remains non-distributed [1][2].

To continue talking to Dosu, mention @dosu.