jupyter-server / pycrdt

CRDTs based on Yrs.
https://jupyter-server.github.io/pycrdt
MIT License
41 stars 10 forks source link

Multithreaded Environment Support #110

Closed dhan2325 closed 2 months ago

dhan2325 commented 4 months ago

Problem

My team and I were hoping to use py-crdt in a server-side context with multiple workers. We ran into the issue of workers switching context and the rust binary panicking when resources were dropped by the original threads, complaining that the structs are "unsendable".

I skimmed through the source code and found the Rust definitions for the pycrdt structs that were causing these issues, and noticed that they were simple wrappers around yrs structs which do implement Send and Sync, but that the wrappers around them are explicitly marked as unsendable, which I thought might be related.

I did also see in related threads in other repositories that an async API system was being considered, and wanted to ask if there were any updates on that front, and if there was some other considerations I am unaware of that required structs to be unsendable? And if there was any proposed solution to this issue that is in the works?

Any information anyone could provide would be very helpful, thanks!

welcome[bot] commented 4 months ago

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

davidbrochart commented 4 months ago

There is a comment pointing to what should be done to allow multithreading, but I haven't gone any further. I'm still interested though, and it would be great if you wanted to open a PR.

davidbrochart commented 4 months ago

I opened #114.

dhan2325 commented 4 months ago

Thank you for the prompt response! I took a look at the PR, and tried cloning + building but taking a closer look it looks as though some of the underlying yrs structs don't implement send and at some point we may need some Mutexing to make this thread-safe?

Performance issues aside, would wrapping each of the underlying yrs structs in the Python interface layer in an Arc<Mutex<T>> solve the issue? If so, is this something that would be helpful to implement and then provide some library interface that allows user to decide if they'd like to use a mutexed version of the library?

davidbrochart commented 4 months ago

Good questions, I honestly don't know. Should we start by just having it work, using mutexes?

dhan2325 commented 4 months ago

I think a Mutex implementation would be very nice to have, so long as it's kept as an optional configuration - applications that are single-threaded shouldn't incur the overhead of locking, etc.

davidbrochart commented 4 months ago

Would this configuration consist of having two separate builds, one with mutexes and one without, and a runtime parameter for choosing one?

davidbrochart commented 4 months ago

Note that even when not sending objects across threads, just creating an object in another thread is problematic since it can be garbage collected from a different thread, which will panic and lead to memory leaks. We can see that in pytest for instance, see here. So it might be safer to always use mutexes.