Open JamesHutchison opened 8 months ago
This is definitely way too much within one PR. Can you break this into smaller PRs?
Assuming this headache of mine doesn't turn into something worse, I'll add comments tomorrow. I don't have the capacity to properly break this out, create / update tests, etc for the changes and I'm asking the community for help to land these features. This PR is intended to be a starting point and is not intended to be merged.
Also just to re-iterate what I said over discord, I've only experienced WS disconnections under high load when ReactPy is not using a BACKHAUL_THREAD
.
A potential alternative to client-side state storage is simply migrating that implementation to core.
It's not clear to me how backhaul threads are equivalent to this. Do you have a doc that explains the architecture? My impression from your description was that it helped with stability in django but it wouldn't help if your node count was scaled down and you still had active connections on the terminated node (same with a client's internet disconnecting / reconnecting).
I think having the client manage their own state is a perfect solution. It very much simplifies the infrastructure and reduces costs. There's a slight delay due to the copy of data but since you're reconnecting there's going to be a delay anyways.
My suggestion on approach:
isinstance
checks is only a couple lines of codeExample usage that supports Pydantic / SQLModel objects:
def default_serializer(obj: Any) -> Any:
if isinstance(obj, BaseModel):
return obj.dict()
raise TypeError(f"Object of type {obj.__class__.__name__} is not JSON serializable")
state_recovery_manager = StateRecoveryManager(
serializable_types=[
TemporaryUser,
QAMessage,
],
pepper="pepper",
default_serializer=default_serializer,
)
configure(app, Main, Options(head=head, cors=cors_options), state_recovery_manager)
If you want to see this in action, it is live on heavyresume.com
Inactivity timeout is set to 4 minutes. You can see it happen in the console. If you scroll down and let it time out, you'll notice that it doesn't regenerate the view on reconnect. Moving the mouse or scrolling triggers reconnection.
Thanks!
One thing I'd like to bring attention to - if the layout logic is rewritten, please leave some means to re-render all components of a certain type. Next best thing is some means to trigger a whole page re-render from the client state. The use case would be hot reloading. Right now I can successfully use jurigged to hot reload - however unless the component has a toggle or something to force a re-render, I have to reload the page.
edit: this is no longer true with my PR below. That PR will disconnect and reconnect and force a rerender.
This PR (on my fork) leverages the logic from this PR to add hot reloading. I will not be merging it into my main branch because I don't want to clutter things up further, and I'm fine just checking out that one branch in the mean time.
This linked PR fixes a bug where use_effect(my_func, [])
was rerunning after the next reconnection. The fix from that PR applies to this one.
After using the heavy stack for many months, here's the known issues that are currently unresolved and may be attributed to my changes:
I'm pretty sure the input hell stems from the change that gives state a deterministic target based on the DOM path. This change was needed to support state rebuild.
The rest may be existing ReactPy issues.
By submitting this pull request you agree that all contributions to this project are made under the MIT license.
Issues
While in a discussion on the discord, it became apparent that ReactPy was unusable for any application where dynamic scaling is required. For example, in cases where Azure/GCP will start/stop workers for dynamic load balancing. As a result this, users will be disconnected randomly and then would have the state of the ReactPy components reset.
There's also other issues I encountered while building this. Please see my comments (will be added soon) which walk through this very large PR. The intent is not to have this PR merged in all at once, rather, it is a guide to subsequent smaller PRs that can properly add tests, etc. I do not have the time to properly land this PR, so I am asking for help.
Solution
The solution to the problem was to have the client store the state that's used for rendering components. The theory is that if the client state matches the server state, then things should work just fine.
In reality, there was some challenges to get to that. The first was that IDs of components and events were random instead of deterministic, so this needed to be changed. Now the IDs are based on their patch path.
The next concern is security. While the state isn't obvious to most users, without proper security a client could manipulate variables that an author would have assumed could only be set by the server. An example might be
is_admin
or something, or maybe the user ID of user viewing the page. Another issue is that server secrets might leak to the client if someone isn't careful.For security, all data given a sha-256 hash as a signature. The server will reject a state variable if it doesn't match the hash. Users are given a personal salt, which they must provide to the server upon reconnection, and the server has a secret "pepper" that is added to make things more difficult. I wasn't satisfied with this, so I added OTP codes that will change, by default, every 4 hours. The OTP codes use site-packages directory's contents (specifically, the parent directory of where reactpy is installed) and the create times to generate a random secret that feeds into the SHA-1 function that generates them. The OTP codes (right now, in the past, and in the future) are added to the hash just to make it that much harder.
All state values have a key that is derived from the file and line number. To avoid leaking this information via brute force, the key is a sha-256 that has about half the bits cut off, making it difficult to recreate.
The client behavior has been revamped. It now will always reconnect, has improved reconnect behavior, and also will disconnect on idle activity (default is 4 minutes). When the user moves the mouse or scrolls, it will reconnect. There's also hooks for when it is reconnecting and has successfully reconnected. The default behavior is to gray out the page and display a spinning pipe character. It's not great.
The performance impact appears to be negligible.
Checklist
The checklist will be for each individual feature that should be a separate PR. TBD
Work items remaining (this PR):
Extracted Issue List
hasattr
in the manner identified