Discussion: removing or combining ports used in the standalone mode

lmatz commented 1 year ago

Just to move some discussion in the Slack to Github

SCR-20230905-maa

Right now, the standalone mode opens dozens of ports. And for cloud deployment, we'd like to reduce the number of ports. cc @arkbriar could you elaborate more on the benefits of having fewer ports in the standalone mode?

For Prometheus, I suppose three ports can be combined into one only.

And for gRPC communication among meta, frontend, and compute, we probably can do the same as they used to be different processes but now they are all in the same process.

How much work does it take? Which set of ports are to be combined/removed most easily first?

arkbriar commented 1 year ago

cc @arkbriar could you elaborate more on the benefits of having fewer ports in the standalone mode?

I just think we don't have to expose such many ports. More or less, it will ease the pain to deploy the standalone mode.

kwannoel commented 1 year ago

Add some comments from @BugenZhao :

@kwannoel:

Not right now for the standalone binary. It is just a wrapper around CN, Meta and Frontend entrypoints. So it will require the ports to be supplied. Eventually they can get removed / combined. Currently I think it still uses gRPC to communicate between the nodes, even though it is a single process. That can be improved. cc @BugenZhao do you have any ideas for it? Did risingwave-playground explore this before?

@BugenZhao:

This is a good point. Generally, we have done a partial of this optimization: For exchanges between actors, we will prefer local channels and directly move the chunks into them if we find two actors on the same worker node. For standalone deployment, all exchanges will naturally take benefit from this optimization. For other gRPC requests, we don’t have this optimization and always communicate on http endpoint. But I guess it’s not a big deal in most cases, as the data amount is small and the performance is not critical. However, it may matter for serving and we may improve this if necessary.

it may matter for serving and we may improve this if necessary. I guess it’s not only simply using a local channel as the underlying medium for tonic gRPCs. We may also consider reducing the encode/decode for each call. This may require us to do some refactor and design a fast path for this kind of deployment.

kwannoel commented 1 year ago

Something else I thought about: Even though meta, CN and frontend share the same process, is it possible that other nodes will need to communicate directly to meta / frontend via gRPC? In that case we need to handle routing logic, and the RPC messages need to be tagged.

kwannoel commented 1 year ago

As for refactoring work, I think the main hurdle is gRPC, I'm not sure how tightly coupled our system is with it, whether it's as simple as not starting the gRPC service. It would require some investigation on my part.

Other refactoring work would be just adding a flag to each rw node, to indicate they are executing in standalone mode, and do not need to start gRPC service.

BugenZhao commented 1 year ago

cc @arkbriar could you elaborate more on the benefits of having fewer ports in the standalone mode?

I just think we don't have to expose such many ports. More or less, it will ease the pain to deploy the standalone mode.

I guess avoiding exposing unnecessary ports should be enough. There's not much benefit for thoroughly removing them and switching to an in-memory channel for most RPC calls, as explained above.

lmatz commented 1 year ago

As the connector node is going to be embedded into CN, will it still rely on gRPC to communicate with CN? Or will it be totally replace with JNI?

kwannoel commented 1 year ago

As the connector node is going to be embedded into CN, will it still rely on gRPC to communicate with CN? Or will it be totally replace with JNI?

Probably replace with JNI. Cc @chenzl25

chenzl25 commented 1 year ago

There would be 2 phases to replace the connector node with the embedded one. In the first phase, only part of RPCs is replaced. But in the end, all of them would be JNI.

github-actions[bot] commented 1 year ago

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

risingwavelabs / risingwave

Discussion: removing or combining ports used in the standalone mode #12086