Running ReactPy in a backhaul thread

Archmonger commented 1 year ago

Description

This PR exists to test out what performance gains can be accomplished when running ReactPy in a backhaul thread.

One backhaul thread per Python process.
Backhaul thread can be enabled or disabled via settings.py:REACTPY_BACKHAUL_THREAD
The entirety of the ReactPy rendering stack is encapsulated within a single backhaul thread.
Implementation is semi-threadsafe since all ReactPy code runs on the exact same thread.

Technical Background

Our existing rendering process does not have the capability to render components while the webserver is doing other things. Below is an oversimplified graphic demonstrating the order of operations we typically go through.

ReactPy Architecture

With the implementation above, ReactPy can queue up numerous render and send events that can flood the main Python thread. This issue is compounded by the fact that our component functions are synchronous, which breaks the operational flow of event queues. This PR moves rendering tasks outside of the main Python thread, allowing for the webserver to render ReactPy components in parallel to anything else. Below is an oversimplified graphic demonstrating this parallelism.

ReactPy Architecture (1)

The graphic above covers the high level concept, however, some additional complexity that isn't depicted. The biggest missed is the preemptive multitasking between the ASGI loop and the backhaul loop. This allows the Python to properly utilize CPU downtime during send events to process other things that hit the webserver.

The level of performance gained from this will depend on the the loop implementation. For example, uvloop can gain great efficiency from this.

Test Configuration

Windows 11 (Build 22621) with asyncio
Linux (WSL Ubuntu 20.04) with uvloop
Single ASGI worker
Data is collected when metrics have stabilized
Components are loaded discretely: One websocket per component.

Results

TLDR: On Linux, threading makes everything faster. On Windows, everything is faster except for renders per second under load.

With uvloop and backhaul threads on Linux, our ~12000 RPS is competitive with roughly equivalent tests on HTTP ASGI frameworks.
renders per second during low rendering workloads is significantly faster
renders per second during heavy rendering workloads is slightly faster on Linux but slightly slower on Windows
time to load during heavy rendering workloads is extremely faster
time to load during heavy network IO workloads is significantly faster
time to load during heavy mixed rendering and network IO workloads is extremely faster
event driven renders during heavy rendering workloads is significantly faster

Time To Load

These tests check how long it takes 250 components to be rendered on the page. One test case needed to be limited to 50 components due to browser limitations.

Simple: These are effectively "Hello World" components with no logic. They only contain body text. Counter: These components simulate how quickly new components would load when the main ASGI event loop is busy re-rendering other components. Time to load does not include the initial HTTP render. Net IO: These components render approximately 10MB worth of VDOM data only once. Mixed: These components continuously re-render approximately 1MB worth of VDOM data. Limited to 50 components and 1MB payload size to prevent the browser tab memory limit. All scenarios below suffered from WS connection time out instability during long runs, likely because 1MB single-packet payloads is unrealistically large.

Test Name	TTL 250 Simple	TTL 250 Net IO	TTL 50 Mixed	TTL 250 Counter	TTL 250 Counter Linux
`uvicorn`	5,077 ms	16,189 ms	43,689 ms	48,397 ms	29,136 ms
`daphne`	5,221 ms	9,427 ms	16,936 ms	39,635 ms	37,446 ms
`hypercorn`	5,088 ms	16,948 ms	56,366 ms	57,417 ms	34,531 ms
`uvicorn` + backhaul	5,065 ms	11,477 ms	5,115 ms	5,256 ms	6,580 ms
`daphne` + backhaul	5,278 ms	8,621 ms	30,339 ms	33,897 ms	N/A
`hypercorn` + backhaul	5,083 ms	11,546 ms	5,144 ms	5,235 ms	6,818 ms

Renders Per Second

_These tests use components that continuously re-render themselves by incrementing a number via use_effect. Tests were re-run with various quantities of components._

Test Name	RPS 1	RPS 100	RPS 500	RPS 1 Linux	RPS 500 Linux
`uvicorn`	6601	5943	5036	9399	8314
`daphne`	5373	6214	5676	6499	6425
`hypercorn`	6573	6673	6142	8833	7797
`uvicorn` + backhaul	7866	4222	4098	12202	8768
`daphne` + backhaul	8075	4786	4701	N/A	N/A
`hypercorn` + backhaul	7045	6340	6129	11633	8449

Event Triggered Renders

These tests check the timing of click event → new thing displayed while the webserver is exposed to various different conditions.

RPS 250: Event triggered renders per second. This is the latency a user should feel if clicking on the page while the webserver is busy with a heavy rendering workload. RPS 250 Avg RT: Average round trip timing which demonstrates event latency during heavy workloads. A "Round Trip" involves the following: client click event → server WS receive → server event handler → server set state → server layout render → server data transmit to client → client layout shift. Event Avg RT: Average event timing it takes for simple components to perform a round-trip render during light workloads. Render is triggered by a client click event. This shows our minimum event latency.

Test Name	RPS 250	RPS 250 Avg RT	Event Avg RT
`uvicorn`	1501	167 ms	52 ms
`daphne`	1494	167 ms	52 ms
`hypercorn`	1502	166 ms	52 ms
`uvicorn` + backhaul	2132	117 ms	53 ms
`daphne` + backhaul	1502	166 ms	55 ms
`hypercorn` + backhaul	2090	122 ms	52 ms

Uvicorn Weird Behaviors

uvicorn appears to create an exceptions when rendering above ~7500 RPS on Windows. Either a bug within the uvicorn package or the WindowsProactorEventLoop implementation. This exception does not appear to impact anything. See exception below:

Exception in callback _ProactorBaseWritePipeTransport._loop_writing(<_OverlappedF...hed result=23>)
handle: <Handle _ProactorBaseWritePipeTransport._loop_writing(<_OverlappedF...hed result=23>)>
Traceback (most recent call last):
  File "C:\Users\Markg\AppData\Local\Programs\Python\Python310\lib\asyncio\events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "C:\Users\Markg\AppData\Local\Programs\Python\Python310\lib\asyncio\proactor_events.py", line 377, in _loop_writing
    assert f is self._write_fut
AssertionError

The backhaul loop seems to stop Ctrl + C from closing out the webserver. This is a Windows-only bug likely related to how uvicorn does Windows subprocesses (does not forward SIGINT to threads).

Daphne Weird Behaviors

Although daphne performance is sometimes fast, it has a lot of performance jitter. During high workloads, ASGI requests get processed in chunks rather than incrementally.
daphne's performance always starts off strong but continuously slows down over time. In comparison, hypercorn and uvicorn have far more consistent performance.
With threaded backhaul on Linux, daphne had frequent WS disconnects (during heavy workloads) and event queue stalling (during light workloads).

Archmonger commented 1 year ago

@rmorshea Is there any way to allow me to modify required tests for this repo?

I just bumped Python versions to 3.9+ as a result of core's requirements and a general need for py3.9+ asyncio.

Archmonger commented 1 year ago

@rmorshea Do you want to review or should I merge?

Archmonger commented 1 year ago

I am going to merge. If you provide a post-merge review I will PR any needed review contents.

reactive-python / reactpy-django