zshipko / ocaml-rs

OCaml extensions in Rust
https://docs.rs/ocaml
ISC License
257 stars 30 forks source link

Distribution of reusable bindings as separate libraries #139

Open Lupus opened 1 year ago

Lupus commented 1 year ago

I'm thinking about some sound approach to distribute bindings for ocaml-rs as separate libraries.

I have a project, ocaml-lwt-interop, which provides some OCaml C stubs, that are wrapped by some OCaml library, and also it provides some library types and functions for Rust. Some other project might be willing to depend on ocaml-lwt-interop and provide more OCaml and Rust primitives.

How it's best to distribute it?

One idea that I currently have is to separate Rust stubs part and Rust library part into separate crates. Crate with stubs will "embedded" into OCaml library, that wraps those stubs with higher level API (yet exposing raw Rust opaque types and getters for them so that they can be used in other binding libraries). And Rust library part should be a normal Rust crate.

Let's say someone wants to write bindings to hyper, this might look like this (Rust libs in red, OCaml libs in yellow):

flowchart
olwti["ocaml-lwt-interop (rust crate)"]
style olwti fill:#e38d8d
olwti_stubs["ocaml-lwt-interop-stubs (rust private crate)"]
style olwti_stubs fill:#e38d8d,stroke:#e3dc8d,stroke-width:3px
rust_async["rust-async (dune library)"]
style rust_async fill:#e3dc8d
ocaml_hyper["ocaml-hyper (rust crate)"]
style ocaml_hyper fill:#e38d8d
ocaml_hyper_stubs["ocaml-hyper-stubs (rust private crate)"]
style ocaml_hyper_stubs fill:#e38d8d,stroke:#e3dc8d,stroke-width:3px
rust_hyper["rust-hyper (dune library)"]
style rust_hyper fill:#e3dc8d
olwti_stubs --> olwti
rust_async --> olwti_stubs
rust_hyper --> rust_async
ocaml_hyper --> olwti
ocaml_hyper_stubs --> ocaml_hyper
ocaml_hyper_stubs --> olwti
rust_hyper --> ocaml_hyper_stubs

But how to solve the version constraints? If final OCaml application installs OCaml library rust-hyper, that vendors all its Rust dependencies, and rust-hyper pulls rust-async, which also vendors its Rust dependencies, won't we end up with conflicting versions of ocaml-lwt-interop being used at the same time?

Probably such libraries with bindings should not vendor Rust deps, and should not be published to opam, but need to all be vendored in the final OCaml application, where Rust dependencies need to be vendored? 🀯

Lupus commented 1 year ago

I've came up with one idea, and it seems to be working and solves this issue. Basically it boils down to piggyback on cargo vendoring and versioning capabilities to distribute OCaml code. I'll explain this idea on ocaml-lwt-interop example.

Project structure overview

β”œβ”€β”€ src                    /* Rust sources */
β”‚   β”œβ”€β”€ lib.rs
β”‚   β”œβ”€β”€ local_executor.rs
β”‚   β”œβ”€β”€ notification.rs
β”‚   β”œβ”€β”€ promise.rs
β”‚   β”œβ”€β”€ ptr.rs
β”‚   β”œβ”€β”€ stubs.rs           /* All of [ocaml::sig] and [ocaml::func] */
β”‚   └── util.rs
β”œβ”€β”€ test                   /* Tests */
β”‚   β”œβ”€β”€ dune
β”‚   └── test.ml
β”œβ”€β”€ vendor                 /* Cargo vendor dir */
β”‚   └──  .............
β”œβ”€β”€ build.rs               /* Build script to parse [ocaml::sig] */
β”œβ”€β”€ Cargo.lock             /* Lock file for local development/CI */
β”œβ”€β”€ Cargo.toml             /* Rust crate definition */
β”œβ”€β”€ dune                   /* OCaml bindings library is defined here */
β”œβ”€β”€ dune-project           /* OCaml dune project definition */
β”œβ”€β”€ LICENSE
β”œβ”€β”€ NOTICE
β”œβ”€β”€ README.md
β”œβ”€β”€ Rust_async.ml          /* Hand-written higher level wrapping over stubs */
β”œβ”€β”€ Rust_async.mli
β”œβ”€β”€ rust-async.opam
β”œβ”€β”€ stubs.ml               /* Generated from stubs.rs */
└── stubs.mli

Cargo.toml

Cargo.toml defines a library of type ["staticlib", "cdylib", "lib"], so that it can be used both from Rust and from OCaml. Otherwise Cargo.toml does not have anything noteworthy.

dune

dune defines a rule to build rust code, and defines a library:

; Include .cargo as dune ignores .* by default
(dirs :standard .cargo)

(rule
 (targets libocaml_lwt_interop.a dllocaml_lwt_interop.so)
 (deps
  (glob_files_rec *.toml)
  (glob_files_rec *.rs)
  .cargo/config
  (source_tree vendor))
 (locks cargo-build)
 (action
  (no-infer
   (progn
    (chdir
     %{workspace_root}
     (run cargo build --target-dir %{workspace_root}/_rust --release --offline --package ocaml-lwt-interop))
    (copy %{workspace_root}/_rust/release/libocaml_lwt_interop.a libocaml_lwt_interop.a)
    (copy %{workspace_root}/_rust/release/libocaml_lwt_interop.so dllocaml_lwt_interop.so)))))

(library
 (name rust_async)
 (public_name rust-async)
 (libraries lwt.unix)
 (foreign_archives ocaml_lwt_interop)
 (c_library_flags
  (-lpthread -lc -lm))
 (preprocess
  (pps lwt_ppx)))

cargo is executed from %{workspace_root} (see (chdir ...)), which is basically _build/default in your project source tree. This rule depends on all *.rs and *.toml files, .cargo/config, and all of the vendor source tree to properly rebuild Rust code on any changes.

So while being in %{workspace_root}, cargo is requested to perform an offline release build only for our package and use %{workspace_root}/_rust as target directory. After that's done, the rule copies Rust artifacts into current folder (as chdir scopes only around cargo build). The no-infer stanza tells dune to not try to figure out where are the command arguments coming from, which allows to use copy instead of resorting to shell and mv.

(locks cargo-build) is a small optimization to avoid having multiple cargo processes running at the same time, as each is spawning a lot of parallel jobs assuming it has all the resources available exclusively, and total build time is a bit slower.

Rest of dune file is pretty straightforward.

Cargo build script

Build script looks like this:

use std::env;

fn main() -> std::io::Result<()> {
    println!("cargo:rerun-if-changed=build.rs");
    println!("cargo:rerun-if-changed=src/stubs.rs");
    let current_dir = env::current_dir()?;
    let current_dir = current_dir.to_str().unwrap();
    let out_filename = "stubs.ml";
    if current_dir.ends_with("/_build/default") {
        println!(
            "cargo:warning=[ocaml-lwt-interop/build.rs] Not generating {} as launched from dune build dir",
            out_filename
        );
        return Ok(());
    }
    ocaml_build::Sigs::new(out_filename).generate()
}

The generated stubs.ml(i) are committed to the repository, so we add an ugly hack to avoid overwriting of source files in dune build dir, as they are write-protected and build script fails to overwrite them, failing the whole build as a result. One is expected to run cargo build directly to regenerate them (or wait till your IDE does that behind your back!). See use cases below for the explanation of this choice.

Use cases

Local builds

When building the project locally, cargo is being launched from _build/default, and is requested to build a specific package, that is locally available in current directory. Cargo picks up the config, vendor dir, Cargo.toml, and builds the package as expected.

Vendoring to other project

Vendoring is the only supported mode to distribute OCaml/Rust bindings for consumption of other libraries or applications. We rely on the fact that cargo brings all the assets in the original repository along with Rust sources - in our case the most valuable assets are OCaml sources and Dune files.

[dependencies]
ocaml-lwt-interop = { git = "https://github.com/Lupus/ocaml-lwt-interop.git", branch = "build-from-vendor" }

When cargo vendors a package, it removes its Cargo.lock and vendor dir, which solves the problem of having multiple Rust package versions in one project, and that also naturally solves the same problem for OCaml part of the bindings.

When Dune builds the project, it finds the vendored Dune projects inside cargo vendor dir, which allows current project to depend on those libraries, and those libraries can also depend on each other, and as long as those dependencies are reflected in corresponding Cargo.toml files, we essentially use cargo to distribute our OCaml code! And more importantly we have a guarantee that OCaml code will consistently use the same version of Rust libraries, and getting some opaque pointer to Rust type from one OCaml binding library, and passing it to another binding library would result in the same memory representation of that type and same functions which will work with it, despite the actual binary code being duplicated in actual .so/.a that are linked by OCaml.

Building of vendored OCaml/Rust library out of cargo vendor dir works, because of (chdir %{workspace_root} ....) for corresponding cargo invocation, and specificication of concrete package that we want to build. If we were to build from the same directory as vendored Cargo.toml, cargo will complain that we try to build a package that is supposed to be in worspace, but it not included in the workspace (YMMV, my top level project uses cargo workspace to define several packages in one folder).

I'm not sure about cargo vendoring and build scripts, it seems that it should invoke them when vendoring the code, I wasn't able to get the build script to run in the vendored library, so I just committed the artifacts. Probably this aspect can be improved. Distributing the [ocaml::sig] generator as a separate binary crate and just calling the binary from dune would simplify a lot of things in this setup.

Having all Rust builds run against single %{workspace_root}/_rust target folder ensures that cargo does not build anything multiple times, which might be the case when using per-project Rust building sandboxes. %{workspace_root} works stable even when Dune libraries are vendored arbitrarily in the project tree, %{workspace_root} will be the same for all those libraries.

Evaluation

So far I created an experimental branch with follows the design described above. I'm able to build the project from it's home directory, and depend on it via git url in another project's Cargo.toml, cargo vendor it and build other project, both using cargo build and dune build.

I have not yet tried this approach to depend on some library A, that depends on some library B, both being OCaml/Rust binding libraries, but I belive it should work as expected, as cargo will vendor both A and B only once in vendor/ dir, Dune will spot them, and resolve the dependencies at OCaml level accordingly.

zshipko commented 1 year ago

Wow, great work! I will find some time to pull down the example code and try building it in a few difference scenarios.

If this proves to be a relatively generic pattern for interacting with cargo from dune then it might be worth getting in touch with the dune team to see if there is any interest in adding additional fields to simplify this process.

Lupus commented 1 year ago

Actually a lot of complexity comes from an attempt to play nicely with Dune and build everything in its sandbox. This ends with requirement to track all Cargo dependencies in Dune, specifying that we need vendor dir, and if there are a bunch of Cargo packages in one workspace, all operated by Dune, it gets pretty messy - I have to track all Cargo-level dependencies of those packages by hand in dune files, as if I forget to add (deps (glob_files_rec ../otherlib/*.rs)), those files won't appear in build sandbox.

Cargo is smart enough to scan the file hierarchy up until it finds the stuff it needs, making things hard to debug when only part of the content required for build is available in build sandbox, and the other part is found by Cargo in the source tree.

Cargo itself arguably performs clean out-of-source builds. The only questionable part is build scripts, they seem to be ran in-source, which kind of violates the model imposed by Dune.

Actually if signatures generation could be moved to some stand-alone tool that Dune could just invoke via it's rules where required, we would probably not need any build scripts in OCaml/Rust binding libraries.

If we disregard the build scripts part (probably cargo vendor should run them when vendoring stuff, otherwise I do not quite understand how vendoring is supposed to work with build scripts?), we can assume that it's safe to call cargo from source tree, and let it work with its default target dir. Under this assumption, Dune only needs to support foreign stubs stanza indicating that stubs are coming from particular Rust crate (either local to workspace or vendored - does not matter). If such stanza exists - run cargo to build that package, and get artifacts from target dir before OCaml library is linked.

I can see how more heavy-weight Dune integration with sandboxing support could be implemented by parsing output of cargo manifest to figure out the dependencies between crates and where they are located in the source tree, to ensure that everything is copied over to build sandbox, but will it be practical to go that far and place a burden of cargo manifest parsing on Dune?

The more I think about it, the more I like the idea to build in source tree actually. If cargo build works for your project, dune build will be able to build your Rust dependencies as well without any tedious debugging of yet another sandbox-related issue (double the fun when it happens in your CI environment!).

Lupus commented 1 year ago

One more concern that arises when thinking about many OCaml/Rust libraries in the wild is linking. I've added two Rust-baked OCaml libraries to single executable this morning and it exploded during the linking phase with the following synopsis (library names obfuscated):

/usr/bin/ld: /home/kolkhovskiy/git/ocaml/my-lib/_opam/lib/rust-lib-one/libocaml_one.a(ocaml_one.ocaml_one.e6e672d9-cgu.3.rcgu.o): in function `ocaml_interop_setup':
ocaml_one.e6e672d9-cgu.3:(.text.ocaml_interop_setup+0x0): multiple definition of `ocaml_interop_setup'; /home/kolkhovskiy/git/ocaml/my-lib/_opam/lib/rust-lib-two/libocaml_two.a(ocaml_interop-49c17f0c8172b8fb.ocaml_interop.9b82f3e6-cgu.10.rcgu.o):ocaml_interop.9b82f3e6-cgu.10:(.text.ocaml_interop_setup+0x0): first defined here
/usr/bin/ld: /home/kolkhovskiy/git/ocaml/my-lib/_opam/lib/rust-lib-one/libocaml_one.a(ocaml_one.ocaml_one.e6e672d9-cgu.3.rcgu.o): in function `ocaml_interop_teardown':
ocaml_one.e6e672d9-cgu.3:(.text.ocaml_interop_teardown+0x0): multiple definition of `ocaml_interop_teardown'; /home/kolkhovskiy/git/ocaml/my-lib/_opam/lib/rust-lib-two/libocaml_two.a(ocaml_interop-49c17f0c8172b8fb.ocaml_interop.9b82f3e6-cgu.10.rcgu.o):ocaml_interop.9b82f3e6-cgu.10:(.text.ocaml_interop_teardown+0x0): first defined here
/usr/bin/ld: /home/kolkhovskiy/git/ocaml/my-lib/_opam/lib/rust-lib-one/libocaml_one.a(ocaml_one.ocaml_one.e6e672d9-cgu.3.rcgu.o): in function `rust_eh_personality':
/rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/personality/gcc.rs:245: multiple definition of `rust_eh_personality'; /home/kolkhovskiy/git/ocaml/my-lib/_opam/lib/rust-lib-two/libocaml_two.a(std-89bc084783fdc439.std.5f6d52e5-cgu.0.rcgu.o):/rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/personality/gcc.rs:245: first defined here
/usr/bin/ld: /home/kolkhovskiy/git/ocaml/my-lib/_opam/lib/rust-lib-one/libocaml_one.a(ocaml_one.ocaml_one.e6e672d9-cgu.3.rcgu.o):(.init_array.00099+0x0): multiple definition of `std::sys::unix::args::imp::ARGV_INIT_ARRAY'; /home/kolkhovskiy/git/ocaml/my-lib/_opam/lib/rust-lib-two/libocaml_two.a(std-89bc084783fdc439.std.5f6d52e5-cgu.0.rcgu.o):(.init_array.00099+0x0): first defined here
collect2: error: ld returned 1 exit status

According to https://github.com/rust-lang/rust/issues/44322, this happends due to lto being enabled, which I did as it decreased artifact sizes twofold.

To make things even more interesting, linking failure can be avoided if another order is used for the archives (according to this blog post.

There is some workaround, suggested in the Rust issue I mentioned above, namely to add the following to linker flags:

LDFLAGS="-Wl,--allow-multiple-definition"

This seems rather ugly and unsafe.

For rust_eh_personality and std::sys::unix::args::imp::ARGV_INIT_ARRAY it's probably somewhat safe as long as you guarantee that all of your opam switch is build with a single version of Rust, which is realistic as OCaml is not distributing binary artifacts and everything is typically build from scratch in-place (although upgrade of Rust after switch has been build might cause some segfaults until relevant Rust-dependent packages are rebuilt).

But for ocaml_interop_setup and ocaml_interop_teardown, versions of ocaml_interop used in different libraries may actually diverge, and linker choosing specific versions at random does not sound great.

Distributing OCaml libraries strictly as parts of Cargo packages does not solve this problem entirely, as Cargo seems to allow vendoring multiple versions of a crate.

I'm a bit lost on how to solve this to be honest.

tizoc commented 1 year ago

I faced this same issue when trying to link multiple separate Rust libraries with OCaml bindings into an OCaml project. It is still pending a solution because I haven't found a quick one and haven't had enough time to dig deeper.

But for ocaml_interop_setup and ocaml_interop_teardown, versions of ocaml_interop used in different libraries may actually diverge, and linker choosing specific versions at random does not sound great.

I think you should be safe with these, because they are likely to go away eventually, or at least, made optional through a feature flag (both are related to the setup and cleanup of boxroots, with the setup part not required anymore with the latest version of boxroot, and teardown only required to make valgrind and similar tools happy).

But you are likely going to have issues with the boxroot symbols too, and for that I think the solution (when building OCaml programs that link in multiple Rust libraries) is probably to package boxroots with opam/dune and skip the inclusion of that code when compiling the Rust code.

Lupus commented 1 year ago

I faced this same issue when trying to link multiple separate Rust libraries with OCaml bindings into an OCaml project. It is still pending a solution because I haven't found a quick one and haven't had enough time to dig deeper.

So you just don't split your bindings into separate crates/dune libraries? That sounds quite painful. Did you at least have some ideas how to approach this?

But you are likely going to have issues with the boxroot symbols too

So far I got complaints from ld about 4 symbols. Not sure if it stopped due to number of errors above certain threshold or because there are no more errors 🀷

So for a project that uses multiple Rust bindings, the best course of action is to statically link all stub libs into one monolithic .a, that OCaml should just link into the final executable? πŸ€”

tizoc commented 1 year ago

So you just don't split your bindings into separate crates/dune libraries? That sounds quite painful. Did you at least have some ideas how to approach this?

Not yet, it is an issue a coworker was facing but in the end we just found a way to temporarily bypass the issue (but it was possible because for this specific case it turned out that not everything needed to be linked all the time, so we didn't really solve anything for the general case).

But you are likely going to have issues with the boxroot symbols too

So far I got complaints from ld about 4 symbols. Not sure if it stopped due to number of errors above certain threshold or because there are no more errors 🀷

I don't remember exactly now, maybe boxroot symbols were not an issue (could be that I had that already separated), but when solving the conflict for the setup/teardown symbols I remember we had issues with some Rust-specific symbols (__rdl_alloc and other allocator related symbols IIRC), and I could not get past that. Here is an issue that seems related btw.

So for a project that uses multiple Rust bindings, the best course of action is to statically link all stub libs into one monolithic .a, that OCaml should just link into the final executable? πŸ€”

Currently I think that is the easiest solution, because based on my limited research (related on those Rust symbols I mentioned above), my conclusion is that separately linking multiple independent static libraries built with Rust is not very well supported, but I may have misunderstood things (and hopefully that is the case!).

Lupus commented 1 year ago

Here is https://github.com/rust-lang/rust/issues/73632 that seems related btw.

Thanks for the pointer! The below comment seems to clearly illustrate the issue:

https://github.com/rust-lang/rust/issues/73632#issuecomment-1083703808