zshipko / ocaml-rs

OCaml extensions in Rust
https://docs.rs/ocaml
ISC License
259 stars 30 forks source link

Comments on thread safety and Sync and Send? #133

Open emchristiansen opened 1 year ago

emchristiansen commented 1 year ago

Context: Calling OCaml from Rust

Now that wrapping Lwt.t is mostly usable, I'm trying to use ocaml-rs in async settings, and I'm running into the problem that none of the ocaml-rs types are labeled Sync or Send. Is this deliberate? If not, can they be safely labeled as such? (I'm guessing nearly everything is just a pointer and thus safe to send between threads.)

Also, is calling to OCaml from Rust thread safe? I'm guessing that if you wanted to enforce serial execution you could do that by just serializing access to the OCaml runtime object in Rust, but I didn't see that in the code (didn't look very hard).

zshipko commented 1 year ago

I haven't experimented much with using ocaml-rs with multiple threads. Like you mentioned ocaml::Value is a pointer so in theory it should be alright as long as access to the runtime is synchronized.

If this is something you're interested in looking into more I would be happy to accept a PR based on your findings!

emchristiansen commented 1 year ago

Are you currently doing any synchronization to serialize access to the OCaml runtime?

FYI I've been getting segfaults when passing ocaml::Value between threads, so there's clearly something dangerous here, but I haven't figured it out yet. Alas I know next to nothing about how the FFI actually works.

Lupus commented 1 year ago

Not sure about OCaml 5, but with prior versions one was expected to register thread to OCaml runtime before executing any OCaml code on that thread. See this discuss topic.

Lupus commented 1 year ago

@emchristiansen what architecture did you end up with? I'm also playing around with attempts to integrate Lwt and Rust futures, and so far without much luck, I've created an executor that runs inside Lwt event loop, but so far it does not give me any advantages, Rust still condiders that I'm trying to move ocaml::Value and ocaml::Runtime across thread boundaries, although my executor runs sequentially in main OCaml thread...

Lupus commented 1 year ago

Updated: I actually ditched my in-house Executor implementation (that is 90% the one I got from async book anyways 😂 ) and used async_executor crate instead, and it worked! I passed asynchronous OCaml function to Rust, where that function moved into async task, that was scheduled by LocalExecutor and it was able to .await on some special promise object, that OCaml side eventually resolved.

FYI I've been getting segfaults when passing ocaml::Value between threads

I'm also running into segfaults but when I pass value to async that runs on a local executor (value is properly rooted so that OCaml does not GC it, see https://github.com/zshipko/ocaml-rs/issues/134#issuecomment-1572125850), so even without any threads.

I'm nearly sure that it's caused by OCaml GC moving the memory behind ocaml::Pointer, and Rust does not really know anything about this happening. I pass my executor as ocaml::Pointer (and it's wrapped with ocaml::custom!), and this reference is captured by the async task that is being spawned. When async OCaml callback is being invoked by Rust async task, and Gc.full_major () is called in that OCaml callback, executor struct gets random stuff inside. Removing the Gc.full_major () postpones the problem till more iterations of the test complete. Valgrind tells nothing about memory issues other that the final attempt to read some random memory, which also indicates that all of that memory is GC managed and valgrind can't actually see the issue there as the memory is "valid" from libc allocator standpoint...

@zshipko maybe some additional Rust magic is required to tell Rust that ocaml::custom! values that it received from OCaml are not safe to use after the function has finished, because OCaml GC can actually move that memory? I'm no way expert in Rust, but the fact that it allows me to reference my executor in async task that definitely outlives the outer "extern C" function seems wrong and hints that maybe some lifetime constraints are missing.

zshipko commented 1 year ago

Thanks for the report - this sounds like great progress! I will take a look at adding some lifetime constraints to custom types.

zshipko commented 1 year ago

After a quick look the lifetime approach will require some more investigation, but if the value is being moved then perhaps rooting the value until it's dropped on the Rust side might be another solution?

Lupus commented 1 year ago

How can I do that? I'm quite new to Rust. I've tried to mem::forget the original value, while wrapping it into Rc and cloning, Rust seems to not be tricked by this, and just optimizes the access to be direct access without extra indirections. Zero cost abstractions, as it is...

So far I can confirm that after I moved my LocalExecutor into thread local static variable, I no longer see the segfault. No other changes to the setup that I currently have, so I'm convinced that the issue was due to relocation of the ocaml::Pointer baking storage by OCaml.

Lupus commented 1 year ago

Ah, you mean rooting via OCaml API? I believe rooting does not stop the GC from moving it around as it sees fit. Root is just an integer, so as long as OCaml is able to find the pointer, it can actually move it. Does that sound right?

zshipko commented 1 year ago

I wonder if you tried using Pointer.0.register_global_root() to root the pointer and then Pointer.0.remove_global_root when youre done if that would signal to the OCaml runtime not to move that value?

I can add some wrapper functions if that does work.

zshipko commented 1 year ago

Ah okay, I suppose I need to look into the guarantees that rooting provides a little deeper!

Lupus commented 1 year ago

image

I can confirm that adding these lines (denoted by green line at the left) does not change the outcome, it still runs into segmentation fault. runtime_test is called only once and launches async Rust task that calls ocaml::Value inside the loop.

Lupus commented 1 year ago

Maybe ocaml::Pointer should always store just pointers in OCaml-managed memory, this way we're 100% safe if OCaml is willing to move them around. Currently any use of value inside ocaml::Pointer outside of the function where it originated is effectively UB, and no safeguards are in place.

zshipko commented 1 year ago

Thanks for checking! Using pointers only sounds like it would work, I will try it out later today when I get a moment. Is your code available online somewhere?

Lupus commented 1 year ago

This SO answer also confirms that custom blocks are subject to move like any other OCaml heap objects.

I will try it out later today when I get a moment. Is your code available online somewhere?

It's a part of a larger codebase right now, and it's quite messy, I wanted to publish it somewhere, but it's nowhere near ready 😂 Just ping me here with some branch, I'll try it out, repro is stable and easy to get. Will that work for you?

zshipko commented 1 year ago

No problem, that works for me!

tizoc commented 1 year ago

@Lupus only store pointers instead of the full value, and also make sure to de-reference them from the Rust side when accessing them (don't keep references to the location, because that part of the heap is owned by OCaml)

You probably want to pin them too.

https://doc.rust-lang.org/std/pin/

Edit: dereferencing example

zshipko commented 1 year ago

Thanks @tizoc for the examples, I started to look at how to port that over to ocaml-rs but unfortunately wasn't able to with the time I had.

@Lupus if you wanted to experiment with converting Pointer to only store pointers or Pin<Box<T>> that would be great! Otherwise I can try to pick this back up later.

Lupus commented 1 year ago

Thanks @tizoc for valuable pointers!

@zshipko I'm not sure why it's necessary to port DynBox magic into ocaml-rs if it could just be used from ocaml_interop package possibly via some decoration layer to make API match the style of ocaml-rs?

So far I've tried to use OCaml<DynBox<T>> directly in my ocaml-rs-based codebase, as there are the required conversion traits in place and it's seamless to do so (which is awesome!). I have two variations of the same function which looks very similar, but one of them causes segmentation fault:

/* #1 */
#[ocaml::func]
#[ocaml::sig("runtime -> unit")]
pub fn runtime_run_pending(rt: OCaml<DynBox<Runtime>>) {
    while Borrow::<Runtime>::borrow(&rt).executor.try_tick() {}
}

/* #2 */
#[ocaml::func]
#[ocaml::sig("runtime -> unit")]
pub fn runtime_run_pending(rt: OCaml<DynBox<Runtime>>) {
    let rt: &Runtime = rt.borrow();
    while rt.executor.try_tick() {}
}

First variant of this function is borrowing reference to runtime multiple times in the loop, executor is running it's tasks inside try_tick until one of those tasks do not call back into OCaml, causing GC to move stuff around and next borrow takes some nonsense values from the OCaml heap, causing segfaults.

Second variant reads pointer in OCaml heap only once and consecutive relocation of that pointer does not wreak havoc any more.

@tizoc is that what you mean by "make sure to de-reference them from the Rust side"? Can Rust type system be leveraged to stop me from dereferencing OCaml<DynBox<T>> multiple times somehow? 🤔

tizoc commented 1 year ago

@Lupus it is very hard to follow what is going on from just code snippets and without being able to expand the macro code or what happens inside try_lock() (for example does it "enter" the OCaml runtime again? or is it 100% on the Rust-side?). If you could extract this into a small, reproducible example that can also run with a debugger it would make things much much easier.

At first glance the first example should work, but I don't know if that value is rooted or not (if it is not, then whenever the GC runs it can become invalid). For a rooted value you would de-reference it every time, and then de-reference the contents too, ensuring that every time you get the correct pointer even if the OCaml runtime moved the memory around.

zshipko commented 1 year ago

I realized ocaml-rs also allows parameters of type &T (or &mut T) when T: Custom - I think using this API could potentially limit the lifetime of that value? But I guess this may not improve the situation if the value is being moved on the OCaml side.

Lupus commented 1 year ago

Okay, I found some time to brush up the code that I had and actually published it on Github, meet ocaml-lwt-interop! I described how things work in README and even tried to illustrate the execution flow for a complex interop scenario with a sequence diagram.

I ended up with some implementation of smart pointer on top of OCaml<DynBox<T>>, seems to work well. It basically roots the dynbox and (unsafely) converts the dynbox borrowed reference to const raw pointer. Deref trait returns this pointer as immutable reference for convenient use. Lifetime parameter ties together returned reference and the smart pointer itself, based on my tests Rust refuses to move that reference into async {} that outlives the current function for example. @zshipko @tizoc would be awesome if you could take a look at this and comment about safety of this approach!

@Lupus it is very hard to follow what is going on from just code snippets

@tizoc Sure, makes sense! I've reconstructed the repro with segfault on top of the version of the code that I published, you can find it [on a separate branch](https://github.com/Lupus/ocaml-lwt-interop/blob/ocaml-dynbox-segfault-repro/src/stubs.rs#L54]. Based on my understanding this segfault is somewhat expected, as try_tick() will call back into OCaml, and probably the value in question is not rooted (ocaml-rs is not rooting function arguments by default IIRC).

zshipko commented 1 year ago

Awesome, I will take a look!

ocaml-rs is not rooting function arguments by default IIRC

I just pushed a commit to the ocaml5 branch to root arguments by default - I wonder if that will have any effect on the segfault?

Lupus commented 1 year ago

I keep hitting the same kind of segfaul once again it seems. Easy to work-around, but I thought maybe there's some nice way to avoid it altogether...

I've started a topic on the Rust forum to get wider discussion.

The repro code is here, in a separate branch of ocaml-lwt-interop repo. Citing here the offending function:

#[ocaml::func]
#[ocaml::sig("executor -> (int -> promise) -> unit")]
pub fn lwti_executor_test(executor: CamlRef<Executor>, _f: ocaml::Value) {
    eprintln!("executor at #1: {:?}", executor);
    let task = executor.spawn(async {
        eprintln!("executor at #2: {:?}", executor);
        executor.spawn(async {}).detach();
    });
    task.detach();
}

Executor pointer inside the task somehow has different values inside the CamlRef, I assume it's some garbage, as executor.spawn segfaults with access to some random memory.

kolkhovskiy@kolkhovskiy-dev:~/git/ocaml/ocaml-lwt-interop$ dune exec test/test.exe rust
creating runtime
running Lwt+Rust test
executor at #1: CamlRef { dynbox_root: Root(0x7f847a568028), ptr: 0x5573dd2fc600, marker: PhantomData<&ocaml_lwt_interop::local_executor::LocalExecutor> }
lwt sleeping
executor at #2: CamlRef { dynbox_root: Root(0x7f847ad513c0), ptr: 0x5573dbacf682, marker: PhantomData<&ocaml_lwt_interop::local_executor::LocalExecutor> }
Segmentation fault

Basically there is OCaml runtime doing arbitraty things after lwti_executor_test has returned, and OCaml runtime invokes the executor, which is running pending tasks. The below diagram shows the flow of execution:

sequenceDiagram
participant lwt_loop as Lwt Event Loop (OCaml)
participant lwt_main as Lwt Main (OCaml)
participant rust_test as Executor test func (Rust)
participant runtime as Executor (Rust)
participant task as Task (Rust)

activate lwt_main
note right of lwt_main: creates Rust runtime
lwt_main ->> rust_test: Run test function
activate rust_test
rust_test ->> runtime: Spawns async task
activate runtime
runtime -->> lwt_loop: Trigger Lwt_unix notification
note right of runtime: Task is stored in runtime state
runtime ->> rust_test: Returns
deactivate runtime
rust_test ->> lwt_main: Returns
deactivate rust_test
note right of lwt_main: Sleeps...
deactivate lwt_main

note right of lwt_loop: Process Lwt_unix notification
lwt_loop ->> runtime: Run pending tasks
activate lwt_loop
activate runtime
runtime ->> task: Run
activate task
note left of task: Tries to access executor and crashes
deactivate task

I'm really lost on why CamlRef is changing. Valgrind reports Conditional jump or move depends right in the middle of printing of second CamlRef, which means that CamlRef pointer is clearly garbage. 🤷

==392509== Memcheck, a memory error detector
==392509== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==392509== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==392509== Command: _build/default/test/test.exe rust
==392509== 

creating runtime
running Lwt+Rust test
executor at #1: CamlRef { dynbox_root: Root(0x53b0028), ptr: 0x4fac450, marker: PhantomData<&ocaml_lwt_interop::local_executor::LocalExecutor> }
lwt sleeping
executor at #2: CamlRef { dynbox_root: Root(==392509== Conditional jump or move depends on uninitialised value(s)
==392509==    at 0x32C748: fmt_int<core::fmt::num::LowerHex, usize> (num.rs:83)
==392509==    by 0x32C748: fmt (num.rs:155)
==392509==    by 0x32C748: core::fmt::pointer_fmt_inner (mod.rs:2527)
==392509==    by 0x326FEA: {closure#0} (builders.rs:322)
==392509==    by 0x326FEA: and_then<(), core::fmt::Error, (), core::fmt::builders::{impl#4}::field::{closure_env#0}> (result.rs:1341)
==392509==    by 0x326FEA: core::fmt::builders::DebugTuple::field (builders.rs:309)
==392509==    by 0x32B778: core::fmt::Formatter::debug_tuple_field1_finish (mod.rs:2163)
==392509==    by 0x28F8F6: <ocaml::root::Root as core::fmt::Debug>::fmt (in /home/kolkhovskiy/git/ocaml/ocaml-lwt-interop/_build/default/test/test.exe)
==392509==    by 0x326D6D: {closure#0} (builders.rs:141)
==392509==    by 0x326D6D: and_then<(), core::fmt::Error, (), core::fmt::builders::{impl#3}::field::{closure_env#0}> (result.rs:1341)
==392509==    by 0x326D6D: core::fmt::builders::DebugStruct::field (builders.rs:124)
==392509==    by 0x32B300: core::fmt::Formatter::debug_struct_field3_finish (mod.rs:2051)
==392509==    by 0x28981A: <ocaml_lwt_interop::ptr::CamlRef<T> as core::fmt::Debug>::fmt (in /home/kolkhovskiy/git/ocaml/ocaml-lwt-interop/_build/default/test/test.exe)
==392509==    by 0x32A18D: core::fmt::write (mod.rs:1232)
==392509==    by 0x2BBE83: write_fmt<std::io::stdio::StderrLock> (mod.rs:1684)
==392509==    by 0x2BBE83: <&std::io::stdio::Stderr as std::io::Write>::write_fmt (stdio.rs:934)
==392509==    by 0x2BC9DA: write_fmt (stdio.rs:908)
==392509==    by 0x2BC9DA: print_to<std::io::stdio::Stderr> (stdio.rs:1007)
==392509==    by 0x2BC9DA: std::io::stdio::_eprint (stdio.rs:1085)
==392509==    by 0x349A16: _ZN10async_task3raw28RawTask$LT$F$C$T$C$S$C$M$GT$3run17h8591c13b20af56c6E.llvm.12487100329734870308 (in /home/kolkhovskiy/git/ocaml/ocaml-lwt-interop/_build/default/test/test.exe)
==392509==    by 0x28A723: ocaml_lwt_interop::local_executor::LocalExecutor::try_tick (in /home/kolkhovskiy/git/ocaml/ocaml-lwt-interop/_build/default/test/test.exe)
==392509== 
0x==392509== Syscall param write(buf) points to uninitialised byte(s)
==392509==    at 0x4860FB3: write (write.c:26)
==392509==    by 0x2BC2BE: write (fd.rs:247)
==392509==    by 0x2BC2BE: write (stdio.rs:66)
==392509==    by 0x2BC2BE: write_all<std::sys::unix::stdio::Stderr> (mod.rs:1544)
==392509==    by 0x2BC2BE: write_all (stdio.rs:170)
==392509==    by 0x2BC2BE: <std::io::stdio::StderrLock as std::io::Write>::write_all (stdio.rs:954)
==392509==    by 0x2BE025: <std::io::Write::write_fmt::Adapter<T> as core::fmt::Write>::write_str (mod.rs:1673)
==392509==    by 0x32C779: fmt_int<core::fmt::num::LowerHex, usize> (num.rs:110)
==392509==    by 0x32C779: fmt (num.rs:155)
==392509==    by 0x32C779: core::fmt::pointer_fmt_inner (mod.rs:2527)
==392509==    by 0x326FEA: {closure#0} (builders.rs:322)
==392509==    by 0x326FEA: and_then<(), core::fmt::Error, (), core::fmt::builders::{impl#4}::field::{closure_env#0}> (result.rs:1341)
==392509==    by 0x326FEA: core::fmt::builders::DebugTuple::field (builders.rs:309)
==392509==    by 0x32B778: core::fmt::Formatter::debug_tuple_field1_finish (mod.rs:2163)
==392509==    by 0x28F8F6: <ocaml::root::Root as core::fmt::Debug>::fmt (in /home/kolkhovskiy/git/ocaml/ocaml-lwt-interop/_build/default/test/test.exe)
==392509==    by 0x326D6D: {closure#0} (builders.rs:141)
==392509==    by 0x326D6D: and_then<(), core::fmt::Error, (), core::fmt::builders::{impl#3}::field::{closure_env#0}> (result.rs:1341)
==392509==    by 0x326D6D: core::fmt::builders::DebugStruct::field (builders.rs:124)
==392509==    by 0x32B300: core::fmt::Formatter::debug_struct_field3_finish (mod.rs:2051)
==392509==    by 0x28981A: <ocaml_lwt_interop::ptr::CamlRef<T> as core::fmt::Debug>::fmt (in /home/kolkhovskiy/git/ocaml/ocaml-lwt-interop/_build/default/test/test.exe)
==392509==    by 0x32A18D: core::fmt::write (mod.rs:1232)
==392509==    by 0x2BBE83: write_fmt<std::io::stdio::StderrLock> (mod.rs:1684)
==392509==    by 0x2BBE83: <&std::io::stdio::Stderr as std::io::Write>::write_fmt (stdio.rs:934)
==392509==  Address 0x1ffefff3d1 is on thread 1's stack
==392509==  in frame #3, created by core::fmt::pointer_fmt_inner (num.rs:2510)
==392509== 
4dbd3c0), ptr: 0x22c682, marker: PhantomData<&ocaml_lwt_interop::local_executor::LocalExecutor> }
==392509== Invalid read of size 8
==392509==    at 0x28A792: ocaml_lwt_interop::local_executor::LocalExecutor::spawn (in /home/kolkhovskiy/git/ocaml/ocaml-lwt-interop/_build/default/test/test.exe)
==392509==    by 0x349A26: _ZN10async_task3raw28RawTask$LT$F$C$T$C$S$C$M$GT$3run17h8591c13b20af56c6E.llvm.12487100329734870308 (in /home/kolkhovskiy/git/ocaml/ocaml-lwt-interop/_build/default/test/test.exe)
==392509==    by 0x28A723: ocaml_lwt_interop::local_executor::LocalExecutor::try_tick (in /home/kolkhovskiy/git/ocaml/ocaml-lwt-interop/_build/default/test/test.exe)
==392509==    by 0x288145: lwti_executor_run_pending (in /home/kolkhovskiy/git/ocaml/ocaml-lwt-interop/_build/default/test/test.exe)
==392509==    by 0x21BE73: camlRust_async__fun_729 (Rust_async.ml:27)
==392509==    by 0x2545D4: camlStdlib__Array__iter_330 (array.ml:95)
==392509==    by 0x22C681: camlLwt_sequence__loop_344 (lwt_sequence.ml:132)
==392509==    by 0x24A04F: camlStdlib__List__iter_507 (list.ml:110)
==392509==    by 0x21D48F: camlLwt_engine__fun_2681 (lwt_engine.ml:359)
==392509==    by 0x21FC99: camlLwt_main__run_loop_436 (lwt_main.ml:41)
==392509==    by 0x21FF2B: camlLwt_main__run_494 (lwt_main.ml:118)
==392509==    by 0x21BC54: camlDune__exe__Test__entry (test.ml:78)
==392509==  Address 0x8408b4824048b58 is not stack'd, malloc'd or (recently) free'd
==392509== 
==392509== 
==392509== Process terminating with default action of signal 11 (SIGSEGV)
==392509==  General Protection Fault
==392509==    at 0x28A792: ocaml_lwt_interop::local_executor::LocalExecutor::spawn (in /home/kolkhovskiy/git/ocaml/ocaml-lwt-interop/_build/default/test/test.exe)
==392509==    by 0x349A26: _ZN10async_task3raw28RawTask$LT$F$C$T$C$S$C$M$GT$3run17h8591c13b20af56c6E.llvm.12487100329734870308 (in /home/kolkhovskiy/git/ocaml/ocaml-lwt-interop/_build/default/test/test.exe)
==392509==    by 0x28A723: ocaml_lwt_interop::local_executor::LocalExecutor::try_tick (in /home/kolkhovskiy/git/ocaml/ocaml-lwt-interop/_build/default/test/test.exe)
==392509==    by 0x288145: lwti_executor_run_pending (in /home/kolkhovskiy/git/ocaml/ocaml-lwt-interop/_build/default/test/test.exe)
==392509==    by 0x21BE73: camlRust_async__fun_729 (Rust_async.ml:27)
==392509==    by 0x2545D4: camlStdlib__Array__iter_330 (array.ml:95)
==392509==    by 0x22C681: camlLwt_sequence__loop_344 (lwt_sequence.ml:132)
==392509==    by 0x24A04F: camlStdlib__List__iter_507 (list.ml:110)
==392509==    by 0x21D48F: camlLwt_engine__fun_2681 (lwt_engine.ml:359)
==392509==    by 0x21FC99: camlLwt_main__run_loop_436 (lwt_main.ml:41)
==392509==    by 0x21FF2B: camlLwt_main__run_494 (lwt_main.ml:118)
==392509==    by 0x21BC54: camlDune__exe__Test__entry (test.ml:78)
==392509== 
==392509== HEAP SUMMARY:
==392509==     in use at exit: 8,803,628 bytes in 114 blocks
==392509==   total heap usage: 119 allocs, 5 frees, 8,803,684 bytes allocated
==392509== 
==392509== LEAK SUMMARY:
==392509==    definitely lost: 0 bytes in 0 blocks
==392509==    indirectly lost: 0 bytes in 0 blocks
==392509==      possibly lost: 1,028,152 bytes in 2 blocks
==392509==    still reachable: 7,775,476 bytes in 112 blocks
==392509==         suppressed: 0 bytes in 0 blocks
==392509== Rerun with --leak-check=full to see details of leaked memory
==392509== 
==392509== Use --track-origins=yes to see where uninitialised values come from
==392509== For lists of detected and suppressed errors, rerun with: -s
==392509== ERROR SUMMARY: 10 errors from 3 contexts (suppressed: 0 from 0)
Segmentation fault
Lupus commented 1 year ago

I just pushed a commit to the ocaml5 branch to root arguments by default - I wonder if that will have any effect on the segfault?

@zshipko I'm pretty sure that it's not GC collecting the executor value in my last example. It's referenced in OCaml side in Lwt unix notification callback, so it can't just go away. To double-check I've added explicit rooting of values in the repro case I described above and that did not change anything.

Lupus commented 1 year ago

Turns out that my segfaults were caused by me ignoring some safety constraints for spawning tasks, see this reply to my topic on rust forum.

@zshipko have you had a chance to look at CamlRef? It seems to work just fine and is quite convenient to use in bindings. Maybe it's something worth including into ocaml-rs? It wraps OCaml<DynBox<T>> behind the scenes, guarantees rooting, acts as smart pointer, so no need to explicitly call .borrow(), and impements required traits so that you can plug any Rust type inside without some ocaml::custom! macros.