risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
7.05k stars 580 forks source link

Optimize large case-when in simple form to constant lookup #14317

Open TennyZhuang opened 10 months ago

TennyZhuang commented 10 months ago

PostgreSQL supports a simple-form case-when expression:

create table test (a int);
insert into test values (1),(2),(3),(5);
SELECT a,
       CASE a WHEN 1 THEN 'one'
              WHEN 2 THEN 'two'
              ELSE 'other'
       END
    FROM test;

In such case, if there is a large number of arms (e.g. 500+), it's better to optimize it to a constant lookup.

struct LookupExpression<S: Scalar> {
    arms: HashMap<Option<S>, BoxedExpression>,
    fallback: BoxedExpression,
}

In addition, if users write a case-when expression that can be converted to the simple form, we can also normalize it in optimizer easily:

SELECT a,
       CASE WHEN a=1 THEN 'one'
            WHEN a=2 THEN 'two'
            ELSE 'other'
       END
    FROM test;

Note: the corner cases should be covered:

SELECT 
       CASE a
              WHEN 1 THEN 'one'
              WHEN 1 THEN 'oneone'
              ELSE 'other',
       CASE a::decimal 
              WHEN 1.0 THEN 'one'
              WHEN 1 THEN 'oneone'
              ELSE 'other'
       END
    FROM test;
chenzl25 commented 10 months ago

I suggest performing this optimization in the binder phase because it is more straightforward and easier to implement. https://github.com/risingwavelabs/risingwave/blob/b9b35190a73e0d3e9ad8a99e17d30a163d1036f6/src/frontend/src/binder/expr/mod.rs#L457

github-actions[bot] commented 8 months ago

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

xzhseh commented 8 months ago

Some future to-dos:

  1. refactor the current bind_case, some branches are pretty similar and could be potentially combined together.
  2. regarding corner case, e.g., the one mentioned in above description
  3. CBO for constant lookup, and we can get rid of a preset limit for number of arms (i.e., 30 at present)
github-actions[bot] commented 5 months ago

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.