haproxytech / haproxy-spoa-dotnet

HAProxy Stream Processing Offload Agent (SPOA) library for .NET Core.
GNU General Public License v2.0
15 stars 7 forks source link

Small API change proposal to be able to write a Kestrel ConnectionHandler #7

Closed aristidegarces closed 2 years ago

aristidegarces commented 3 years ago

Hello

I'm trying to build a Kestrel SPOA but the signature of the FrameProcessor prevents me from doing so. Would you mind if I send a pull request to allow having a separate input and output stream?

i.e. void HandleStream(Stream inputStream, Stream outputStream, Func<NotifyFrame, IList> notifyHandler);

The issue is that the ConnectionContext I get in the callback to my class gives me an IDuplexPipe with an Input (PipeReader) and an Output (PipeWriter). I can convert them to Streams, but they're not the same Stream.

See https://docs.microsoft.com/en-us/dotnet/api/microsoft.aspnetcore.connections.connectionhandler.onconnectedasync?view=aspnetcore-5.0#Microsoft_AspNetCore_Connections_ConnectionHandler_OnConnectedAsync_Microsoft_AspNetCore_Connections_ConnectionContext_

Let me know what you think

Thanks

Aristides

NickMRamirez commented 3 years ago

Yes, but I would add it to the interface as a second function (an overload) instead of changing the existing function signature. Thank you for your interest in this project!

cortex93 commented 3 years ago

I'm interested by the PR. A year ago, I also tried to use IDuplexPipe as a way to improve performance.

At the time but I think that didn't change that much, spoa-dotnet was synchronous in the way it handles frame and messages. It reads one frame, decodes, processes it. If one or more messages were enqueued, it writes frames in loop (which can be more than one depending the number of message and the size of them produced during processing). Doing so, reading frame is blocked and no parralell processing can be done. For me, this was a major perf issue because my processing step was to call an outside api and I/O path was not optimum.

But using Stream on top of IDuplexPipe, from my perf tests, didn't not bring that much of gain. From my perf test, this was not the main bottleneck. The fact, is that Frames are encoded and decoded using recursive ToArray. Which leads to a lot of object creation and lot of GC presure under heavy load.

I think a more efficient way would be to use Pipes and ReadOnlyBuffers/Spans.

The way Kestrel works with IDuplexPipe for HTTP Request is very threadish. Transposed to SPOA, IDuplexPipe would allow to have a thread handle a connection and reading the input pipe as much as it can (automatic flow control). When a complete frame is readable, it can be processed on the same thread (handshake/disconnect) or the payload can be written to another pipe (notify/unset). This pipe is a buffer for fragmented message (one or many fragment). Another thread is async reading the pipe and is notified only when the message payload is completely written on his pipe. It can decode the list of messages and process them async. Output frame can either be directly written to the output pipe part of the connection, or written by another thead (using Pipe or Message Buffer).

The real benefits of IDuplexPipe is to be able to read directly from the Pipe but it means being able to read and write frame from ReadOnlySequence/ReadOnlySpan to avoid as much as possible object allocation.

cortex93 commented 3 years ago

Forgot to mention I got down the idea because of a bug in HAProxy in how spoa connection was handled. I just give a new try with the latest version (the bug has been fixed meanwhile) and the difference is just awesome.

The message processing is doing basic cookie DataProtection encryption which is rather constant.

Some figures. spoe backend in HAProxy has maxconn 1.

ab -n 4000 -c 100 http://localhost/

It took 27 seconds and 7ms per request (144.81 req/sec) with spoa-dotnet version. By the end, a lot of errors occured due to c# app freeze (GC scavenging I guess, it happens a lot on production under heavy load), connection is closed by HAProxy and reopened.

It took 4 seconds and 1ms per request (896.22 req/sec) with my poc (pipelining enabled) and no error.

I'll try to share my current implementation after some cleanup.

aristidegarces commented 3 years ago

The change was done in a criteo-fork https://github.com/criteo-forks/haproxy-spoa-dotnet/commit/bef526f3209399b57991d222f9d489a2b13bbea4

I'm trying to push a PR to this repo, but I'm getting 403 unauthorised using my PAT :(

cortex93 commented 3 years ago

Hi, just to let you know I took time to put bits altogether at https://github.com/inulogic/HAProxy.StreamProcessingOffload.AgentFramework/commit/b9ab15ba29d79b5be6afb426afa29e9f49ab662c

While benchmarking, I raised an issue in HAProxy which has been patched . https://github.com/haproxy/haproxy/issues/1340

NickMRamirez commented 3 years ago

@cortex93 Thanks for chasing down that issue.

When you say you have the bits, are you saying you will submit a PR?

cortex93 commented 3 years ago

When I start fiddling with IDuplexPipe and span, it was a major change from your implementation. So at the time, I decided to start from scratch, and then drop due lack of time and bugs in haproxy.

I just revive old source code in a public repo.

NickMRamirez commented 2 years ago

This has been implemented with https://github.com/haproxytech/haproxy-spoa-dotnet/pull/11