risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
7.07k stars 582 forks source link

feat(batch): expression's error handling #17844

Open st1page opened 3 months ago

st1page commented 3 months ago

Some users complains that they do not want some rows' expression error prevent the whole query's processing and result.

TRY() function

presto and trino do this https://trino.io/docs/current/functions/conditional.html#try https://prestodb.io/docs/current/functions/conditional.html#try

NOTE: after https://github.com/risingwavelabs/risingwave/pull/12461, the rw's implementation of eval_infallible is not equal to the try() function in these systems

implement your own try version of each function, such as TRY_TO_DATE(), TRY_CAST().

duckdb, snowflake and databricks do this https://duckdb.org/docs/sql/expressions/cast.html#try_cast (though it does not have other functions) https://docs.snowflake.com/en/sql-reference/functions/try_to_date https://docs.databricks.com/en/sql/language-manual/functions/try_cast.html

Personally, prefer the first one.

lmatz commented 3 months ago

I guess they chose the second one because their expression framework cannot be easily adapted for the first one 🤔 The first one is composable, I believe strictly better than the second one.

xiangjinwu commented 3 months ago

Previous issue: #4458

st1page commented 3 months ago

Previous issue: #4458

Thanks! I remember we did some discussion somewhere but I did not find it. So I think the previous issue is closed because we solved the streaming part but currently we can think about if we need to do something for the batch query.

BugenZhao commented 3 months ago

Some users complains that they do not want some

...? 👀

st1page commented 3 months ago

Some users complains that they do not want some

...? 👀

added.. Sorry, I lost this part

fuyufjh commented 2 months ago

As a streaming database, I hope to keep the SQL features consistent among streaming (i.e. create materialized view as select) and batch (i.e. select). For this case, I think you don't plan to add TRY() to streaming, right? If so, it breaks the aforementioned consistency.

Alternatively, shall we introduce a session variable to expose the expression's strict_mode to user? This was designed in https://github.com/risingwavelabs/risingwave/issues/4625 (See "Make it configurable") but not implemented yet.

xxchan commented 2 months ago

Alternatively, shall we introduce a session variable to expose the expression's strict_mode to user? This was designed in https://github.com/risingwavelabs/risingwave/issues/4625 (See "Make it configurable") but not implemented yet.

IIUC, currently

Do you mean to support non-strict mode for batch via a session var? We can't support strict mode for streaming easily. (Perhaps better to call it batch_strict_mode)

fuyufjh commented 2 months ago

Do you mean to support non-strict mode for batch via a session var? We can't support strict mode for streaming easily.

Exactly.

It makes sense to use strict mode for batch query, so that unexpected problems can be revealed early to prompt users to modify their query or clean their data. But in some cases, users may expect a failed expression to leave a NULL as result, so they can explicitly set strict_mode = false to work around.

(Perhaps better to call it batch_strict_mode)

Yeah, agree. But I'd like to mention 'expression' as well, how about batch_expr_strict_mode?

github-actions[bot] commented 3 weeks ago

This issue has been open for 60 days with no activity.

If you think it is still relevant today, and needs to be done in the near future, you can comment to update the status, or just manually remove the no-issue-activity label.

You can also confidently close this issue as not planned to keep our backlog clean. Don't worry if you think the issue is still valuable to continue in the future. It's searchable and can be reopened when it's time. 😄