y-crdt / ydotnet

.NET bindings for yrs.
MIT License
29 stars 9 forks source link

Exception in Rust code bringing the entire dotnet server down #88

Closed vdurante closed 2 weeks ago

vdurante commented 5 months ago

Hey, sorry to bother once again, but I was wondering if you have any idea for the issue I am facing.

For some reason, every now and then, I get an "exception" in Rust which basically completely breaks the application. The server goes down and has to reboot.

I am trying to pinpoint what is the root cause, but right now I am having a hard time reproducing the issue.

I was wondering if you have any insights on how to prevent the server from going down. If there is any way to capture the rust error, log it and move on.

Thanks!

SebastianStehle commented 5 months ago

Unfortunately not. It is one of 2 errors:

  1. A panic in rust.
  2. A null pointer.

We are working to solve the second problem with smart pointers, but it is still unsolved. Usually you cannot log anything anymore, when this happens. This also sucks.

Horusiath commented 5 months ago

We're working on this on Rust side and should be able to show some improvements regarding invalid pointers soon.

vdurante commented 4 months ago

I might try to fix the panic in rust issue then. Not sure if it is possible to fix that tho, but maybe just add some code to the way we call the rust binaries to catch panics and prevent the pod from going down.

How does ydotnet interact with Rust binaries? Is there any documentation I could read on the topic?

SebastianStehle commented 4 months ago

I think it is not possible, because a panic is basically a process.exit(ERROR) in Dotnet. But I would be happy if you could solve that.

There is no documentation, but this is how it works:

  1. Y-crdt has a C binding layer: https://github.com/y-crdt/y-crdt/tree/main/yffi that is used for that.
  2. In out native namespace we have all the code to talk with the channels: https://github.com/y-crdt/ydotnet/blob/main/YDotNet/Native/Document/DocChannel.cs#L15
  3. Sometimes we have to follow pointers, then we always do that in the Native namespace, e.g. here: https://github.com/y-crdt/ydotnet/blob/main/YDotNet/Native/Document/Events/SubDocsEventNative.cs#L21
  4. We expose CLR types that are converted from the native types. The conversion is done outside of the native namespace. The goal is to have a dependency to the native namespaces but not from this namespace to avoid a circular dependency hell. See: https://github.com/y-crdt/ydotnet/blob/main/YDotNet/Document/State/DeleteSet.cs#L10
vdurante commented 4 months ago

@SebastianStehle I spent some time reading about it, and it seems that it is not possible to handle panics the way YDotNet have implemented it. It would require it to run in a separate process or similar :(

Also, the issue I am facing seems quite hard to reproduce. I only saw it happen once. If I am able to consistently reproduce I will try to investigate. Thanks for all the help!

SebastianStehle commented 2 weeks ago

I am closing this. It is only a meta issue.