implement multiplexing channel

subes commented 1 year ago

reuse buffers in intermediate/transformer/pipe steps. Poll through readers (refactoring for sockets: hasnext has to only return true when a full message arrives). Polling through writers will be skipped for now, messages need to be kept small enough so that no blocking during writes happens (e.g. via fragmented writer that keeps it as small as UDP packet size).

The multiplexers store an arraylist for transport channel readers/writers. Readers tell either via a message wrapper what origin a message is from or a getter could tell it. I guess a wrapper might be better, but will not be compatible with other readers that expect a IByteBufferProvider directly. Though we could extend that interface and add the selectionkey/identifier for the source there.

subes commented 1 year ago

Read/write with worker threads seems to be used by cloudflare as well (still using epoll in the front through). Though they say batching with io_submit from AIO (blocking) can reduce syscalls and be slightly faster.

https://blog.cloudflare.com/io_submit-the-epoll-alternative-youve-never-heard-about/

Java library for AIO (though can only handle files, not sockets): https://github.com/zrlio/jaio See: https://github.com/zrlio/jaio/issues/4

subes commented 1 year ago

implemented this differently in the form of an RMI replacement for rpc.

The server does the multiplexing for service calls using one IO (requests) and multiple worker (responses) threads.

hasNext only returns true when a full request arrived in IO thread
work happens in worker thread which sets the response message (including the serialization/marshalling)
flushing happens in IO thread again, after flushing is done the next request can be received Buffers are pooled, so reuse of reader/writer layers is not needed. The layers can be as stateful as they want (e.g. fragmenting).

Client side does not use multiplexing, instead a pool of connections is used which is handled by the thread that did the request to poll for responses. This is because this provides the least latency on the client side because it does not need to offload requests to a different thread. On the client side we will not expect to have thousands of concurrent requests. Handling that amount will only occur on the server.

The current solution works with arbitrary channel transports. Though throughput on the server could be improved further by using OS-level multiplexing like EPOLL via NIO Selector or jaio. Could also integrate IOUring (through netty-iouring binding) in such a way, but leaving that as an excecise for sometime in the future. The hasNext-loop in the server just needs to skip work if the selector does not say that there is work to do for a given client.

subes commented 1 year ago

still need to implement some unit tests that compare our solution against netty and mina with mutliple clients. This test should then also provide a benchmark scalability for x simultaneous clients (ideally all using networking over loopback to have an even ground).

subes commented 1 year ago

finished, now we have both multiplexing clients and servers that can handle multiple parallel requests over a single connection or a (dynamic) pool of connections. Also rpc services can define if requests should be blocking (skip worker executor on client/server side) and support future return values.

invesdwin / invesdwin-context-integration

implement multiplexing channel #39