risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
7k stars 575 forks source link

Provide java udf server instead of libs. #10322

Closed liurenjie1024 closed 5 months ago

liurenjie1024 commented 1 year ago

Currently user needs to setup udf server by themselves. Another approach is that we can provide a udf server to the user, and loads user provided jar when startup. This way user only needs to focus on udf development.

liurenjie1024 commented 1 year ago

cc @wangrunji0408 Feel free to comment.

xxchan commented 1 year ago

I'm also thinking about this when considering https://github.com/risingwavelabs/risingwave/issues/9002.

This might be a larger problem for cloud. If we allow arbitrary UDF server, we need extensive defensive checks. If we host the servers, and let users to register functions, we can at least ensure the protocol is correct... (avoid problems like https://github.com/risingwavelabs/risingwave/issues/10828, https://github.com/risingwavelabs/risingwave/issues/11022) But of course that might limit flexibility and increase operation burden. 🤔️

liurenjie1024 commented 1 year ago

I think providing udf server rather than only libs have many advantages:

  1. Improve user experience. This way user only needs to focus on their bussiness logic, and uploading jars to some file server, then use statements like create udf xxx at s3://xx/bb.jar
  2. Easier management and observability. There are many things to consider when deploying udf server, for example auto scaling, observability, failover, etc. These in fact require managed service.
xxchan commented 1 year ago

Btw, isn’t that Flink’s solution for Python UDF? i.e., the Python runtime process is fully managed by Flink.

I think there are different solutions and each has different advantages:

xxchan commented 1 year ago

There are many things to consider when deploying udf server, for example auto scaling, observability, failover, etc. These in fact require managed service.

Agree. But maybe providing a server (like connector node) for users isn't enough to solve problems like scaling/failover. Maybe the solution should be to allow users to deploy UDF to Lambda, and/or have our managed UDF servers.

On Wed, Jul 19, 2023 at 1:49 PM xxchan @.***> wrote:

Btw, isn’t that Flink’s solution for Python UDF? i.e., the Python runtime process is fully managed by Flink.

I think there are different solutions and each has different advantages:

  • Fully external (current): maximal flexibility, users can have any dependencies, and can have their own middlewares/gateway (to achieve scaling/observability/failover). Snowflake/Redshift supports this (Mainly deployed as Lambda function, and they both use JSON as the protocol).
  • Sidecar runtime process managed by RisingWave: User submits jar/py. I guess dependencies & debugging are not as good.
  • Separate server but manually deployed (like connector node) (can be deployed both by user and us?) : Looks like a tradeoff between the above two. Does any other product support this? 🤔️

On Wed, 19 Jul 2023 at 04:04, Renjie Liu @.***> wrote:

I think providing udf server rather than only libs have many advantages:

  1. Improve user experience. This way user only needs to focus on their bussiness logic, and uploading jars to some file server, then use statements like create udf xxx at s3://xx/bb.jar
  2. Easier management and observability. There are many things to consider when deploying udf server, for example auto scaling, observability, failover, etc. These in fact require managed service.

— Reply to this email directly, view it on GitHub https://github.com/risingwavelabs/risingwave/issues/10322#issuecomment-1641273782, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJBQZNL2SAMVRB3NWPC5473XQ46BVANCNFSM6AAAAAAZFU54EA . You are receiving this because you commented.Message ID: @.***>