risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
6.8k stars 564 forks source link

optimizer: Improve column name of the operator #10093

Open st1page opened 1 year ago

st1page commented 1 year ago
dev=> create table t(c STRUCT<x INTEGER, y INTEGER>);
dev=> create materialized view mv as select t1.c from t t1 join t t2 on (t1.c).y = (t2.c).y;
CREATE_MATERIALIZED_VIEW
dev=> show internal tables;
                   Name                   
------------------------------------------
 __internal_mv_3_hashjoindegreeright_1006
 __internal_mv_3_hashjoinright_1005
 __internal_mv_5_chain_1008
 __internal_mv_3_hashjoindegreeleft_1004
 __internal_mv_3_hashjoinleft_1003
 __internal_mv_4_chain_1007
(6 rows)

dev=> select * from __internal_mv_3_hashjoindegreeright_1006;
 $expr2 | t__row_id | _degree 
--------+-----------+---------
(0 rows)

The $expr2 is an alias referring to an expression in the plan which can get in the explain. But the user can not get the reference when he query the internal table.

dev=> explain create materialized view mv as select t1.c from t t1 join t t2 on (t1.c).y = (t2.c).y;
                                                                                                   QUERY PLAN                                                                                                   
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 StreamMaterialize { columns: [c, t._row_id(hidden), $expr1(hidden), t._row_id#1(hidden)], stream_key: [t._row_id, t._row_id#1, $expr1], pk_columns: [t._row_id, t._row_id#1, $expr1], pk_conflict: "NoCheck" }
 └─StreamHashJoin { type: Inner, predicate: $expr1 = $expr2 }
   ├─StreamExchange { dist: HashShard($expr1) }
   | └─StreamProject { exprs: [t.c, Field(t.c, 1:Int32) as $expr1, t._row_id] }
   |   └─StreamTableScan { table: t, columns: [c, _row_id] }
   └─StreamExchange { dist: HashShard($expr2) }
     └─StreamProject { exprs: [Field(t.c, 1:Int32) as $expr2, t._row_id] }
       └─StreamTableScan { table: t, columns: [c, _row_id] }

I propose to change the column name of the schema() to the expression name back. And add a column_alias() -> Vec<Option<String>> on the PlanNode with the alias.

st1page commented 1 year ago

c.c. @chenzl25 @fuyufjh

fuyufjh commented 1 year ago

I acknowledge this inconvenience, but using an expression as column name such as is more confusing to me 🥲

fuyufjh commented 1 year ago

If we can provide some way to let user inspect the streaming job (i.e. a running materialized view), it might be better?

github-actions[bot] commented 2 months ago

This issue has been open for 60 days with no activity.

If you think it is still relevant today, and needs to be done in the near future, you can comment to update the status, or just manually remove the no-issue-activity label.

You can also confidently close this issue as not planned to keep our backlog clean. Don't worry if you think the issue is still valuable to continue in the future. It's searchable and can be reopened when it's time. 😄