risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
6.76k stars 557 forks source link

feat(batch): expression's error handling #17844

Open st1page opened 1 month ago

st1page commented 1 month ago

Some users complains that they do not want some rows' expression error prevent the whole query's processing and result.

TRY() function

presto and trino do this https://trino.io/docs/current/functions/conditional.html#try https://prestodb.io/docs/current/functions/conditional.html#try

NOTE: after https://github.com/risingwavelabs/risingwave/pull/12461, the rw's implementation of eval_infallible is not equal to the try() function in these systems

implement your own try version of each function, such as TRY_TO_DATE(), TRY_CAST().

duckdb, snowflake and databricks do this https://duckdb.org/docs/sql/expressions/cast.html#try_cast (though it does not have other functions) https://docs.snowflake.com/en/sql-reference/functions/try_to_date https://docs.databricks.com/en/sql/language-manual/functions/try_cast.html

Personally, prefer the first one.

lmatz commented 1 month ago

I guess they chose the second one because their expression framework cannot be easily adapted for the first one 🤔 The first one is composable, I believe strictly better than the second one.

xiangjinwu commented 1 month ago

Previous issue: #4458

st1page commented 1 month ago

Previous issue: #4458

Thanks! I remember we did some discussion somewhere but I did not find it. So I think the previous issue is closed because we solved the streaming part but currently we can think about if we need to do something for the batch query.

BugenZhao commented 1 month ago

Some users complains that they do not want some

...? 👀

st1page commented 1 month ago

Some users complains that they do not want some

...? 👀

added.. Sorry, I lost this part

fuyufjh commented 4 days ago

As a streaming database, I hope to keep the SQL features consistent among streaming (i.e. create materialized view as select) and batch (i.e. select). For this case, I think you don't plan to add TRY() to streaming, right? If so, it breaks the aforementioned consistency.

Alternatively, shall we introduce a session variable to expose the expression's strict_mode to user? This was designed in https://github.com/risingwavelabs/risingwave/issues/4625 (See "Make it configurable") but not implemented yet.

xxchan commented 3 days ago

Alternatively, shall we introduce a session variable to expose the expression's strict_mode to user? This was designed in https://github.com/risingwavelabs/risingwave/issues/4625 (See "Make it configurable") but not implemented yet.

IIUC, currently

Do you mean to support non-strict mode for batch via a session var? We can't support strict mode for streaming easily. (Perhaps better to call it batch_strict_mode)

fuyufjh commented 3 days ago

Do you mean to support non-strict mode for batch via a session var? We can't support strict mode for streaming easily.

Exactly.

It makes sense to use strict mode for batch query, so that unexpected problems can be revealed early to prompt users to modify their query or clean their data. But in some cases, users may expect a failed expression to leave a NULL as result, so they can explicitly set strict_mode = false to work around.

(Perhaps better to call it batch_strict_mode)

Yeah, agree. But I'd like to mention 'expression' as well, how about batch_expr_strict_mode?