[Discuss] Could bindings generators live in separate crates?

data-sync-user commented 3 years ago

For our current stage of development, I like the way that we currently bundle the bindings generators for all languages as part of the main uniffi_bindgen crate. It simplifies testing and it (mostly) ensures that we don't leave any languages behind. But I wonder if it's the right thing long term:

As we've seen with the Gecko backend, some integrations are very hard to test in CI. If the Gecko backend was able to live in mozilla-central and be tested as part of Firefox CI then I expect that would be simpler.
We currently include a Python backend, but it's falling behind, and I kind of get the impression it's hard to prioritize reviews for that code because it's not on the critical path for our consumers. We might be better off if I moved that into a separate repo and maintained it as a "passion project".
If we get more contributors, they might be coming from a different language and have their own set of needs, and they might find it hard to add an additional backend and keep it up to date, similar to the Python one above.

Other projects in the Rust ecosystem appear to have had some success in externalizing specific backends into separate crates, e.g. serde. What would it look like for us to take a similar approach, and when (if ever) would it be worth the costs of doing so?

┆Issue is synchronized with this Jira Task

data-sync-user commented 3 years ago

➤ jhugman commented:

I agree with all of your above; especially around Gecko, and backends from contributors.

It might be worth writing down the necessary pre-conditions before we'd consider doing this:

finalize the intermediate representation between parser and bindings that we can support in a backwards compatible manner, i.e. ComponentInterface and friends.
finalize the shape of the generated rust scaffolding, which new languages need will be calling, e.g. errors, strings, destructors, handles etc
document the binary protocol for packing and unpacking of records, sequences, optionals, etc.

FWIW: I think as we get closer to maturity for the shape of how backends should be written, the python binding will be easier to review, test and keep up. I'm optimistic now test_rondpoint.py is running, adding to the third backend will be considerably faster than the first and second.

data-sync-user commented 3 years ago

➤ Ryan Kelly commented:

I've been thinking a bit more about this in the context of #435 and the resulting Ruby backend. I think I was actually over-complicating it in my head a bit, and we more-or-less have all the pieces that we'd need to start restructuring things in this way.

Currently the uniffi-bindgen executable has three main jobs:

When run as uniffi-bindgen scaffolding, it reads the .udl file and generates the corresponding Rust scaffolding. This currently does not depend on any details of the intended foreign-language consumer(s).
When run as uniffi-bindgen generate l $LANG, it reads the .udl file and generates the corresponding bindings for the given foreign language. Importantly, this is based only on the contents of the .udl file and and languagespecific config.
When run as uniffi-bindgen test, it figures out how to run a test script for a specific foreign-language binding. This is mostly intended for use by automated tests produced by macros from the uniffi::testing support crate and not for calling by hand.

We can leave the first of those jobs intact in the main uniffi_bindgen crate, and move the other two into language-specific crates that take a cargo dependency on uniffi_bindgen. Perhaps organized something like the following, although of course this is just a rough sketch.

The Consumer ViewFor running on the command-line, users would need to install both the uniffi_bindgen crate as well as e.g. uniffi_bindgen_kotlin and uniffi_bindgen_swift. These would install corresponding binaries named uniffi-bindgen, uniffi-bindgen-kotlin and uniffi-bindgen-swift, by analogy to how cargo sub-commands work.

When invoking uniffi-bindgen generate l $LANG, the main executable would forward its commandline arguments on to uniffi-bindgen-$LANG, perhaps lightly normalized for simplicity. It's just a direct pass-through execution and otherwise works exactly like the current all-in-one executable. We could add a special command-line flag to assert compatibility of uniffi versions, so that a call like this:

uniffi-bindgen generate --language kotlin ./path/to/my.udlWould in turn shell out to the kotlin-specific backend like:

uniffi-bindgen-kotlin generate --uniffi-version="v0.11.0" --out-dir=resolved/output/dir --config-path=resolved/config/path ./path/to/my.udlA similar setup could work for test scripts, although we'd have to think a bit about the command-line API surface for that one. When it sees the --uniffi-version flag, the language-specific bindgen program would be expected to check for compatibility with that version of the crate and error out if there's a mis-match, similar to what the Rust scaffolding already does ( https://github.com/mozilla/uniffi-rs/blob/041660d62b16b22a130361347432fc9612a26a51/uniffi/src/lib.rs#L69 ).

The user doesn't have to know about any of that, however - they just have to know to install the backends they want.

For users who want to integrate into a broader build system rather than installing the tools at the system-level, we could support an extension of the pattern used in application-services ( https://github.com/mozilla/application-services/tree/main/tools/embedded-uniffi-bindgen ). The consumer would need to make a wrapper crate that depends on all three of uniffi_bindgen, uniffi_bindgen_kotlin and uniffi_bindgen_swift, and stitch together their exposed public APIs to make a combined binary. They might end up making a little mini wrapper crate that looks like:

fn main() -> anyhow::Result<()> { uniffi_bindgen::run_main(( uniffi_bindgen_kotlin::KotlinBackend, uniffi_bindgen_swift::SwiftBackend, ))And then executing that via cargo run as part of their build. It would behave just like running uniffi-bindgen on the command-line, but limited to the specified backends.

This could actually be a small concrete win for the application-services build setup, because we wouldn't need to compile the Python or Ruby or other backends as part of the application-services build.

Notably absent here is the need for any sort of serialized internal representation to be passed between the executables - everything they need to know is already in the .udl file, and the --uniffi-version flag would ensure that they interpret it in the same way.

The Developer ViewTo make writing language backends as simple as possible, we would try to keep as much infrastructure as possible in the uniffi_bindgen crate, and have each uniffibindgen$LANG crate depend on it for core functionality. Perhaps something like:

The main uniffi_bindgen crate provides:
The ComponentInterface data structures as officially-supported public API.
The implementation of the scaffolding command and its related templates.
The main.rs implementing the uniffi-bindgen command as described above.
A public trait ForeignLanguageBackend designed to be implemented by language-specific crates. This would broadly mirror the shape of the top-level modules in the current uniffi_bindgen::bindings module, with methods like:
::new(config) for creating an instance of the backend from config data
::generate(&self, ci: &ComponentInterface) > Result for generating the bindings from a parsed component interface into some inmemory format
::run_script(&self, current_dir: &Path, script_file: &Path -> Result<()> for running a test script
A function that takes a generic and does all the plumbing to execute the command-line tool for a specific language backend.
Some thorough docs on how you're supposed to translate a ComponentInterface into code.
Each uniffibindgen$LANG crate provides:
A concrete implementation of ForeignLanguageBackend targetting that language
All the templates etc necessary to implement it.
All the logic for running test scripts etc in that language.

Perhaps we could also publish some of our testing crates in a way that the individual language backends could import and use them, spreading out e.g. test_coveralls.kts to live with the Kotlin backend, test_coveralls.swift to live with the Swift backend, etc.

One risk here is that we amplify the cost of breaking changes, and particularly of breaking changes in the "ABI" of how data gets lifted and lowered. My personal sense is that it will be worth the reduction in overall system complexity that we get from splitting things into more separable components.

data-sync-user commented 3 years ago

➤ Mark Hammond commented:

How would that plan fit with #416?

data-sync-user commented 3 years ago

➤ Ryan Kelly commented:

How would that plan fit with #416?

I think it would be OK, because the code for each backend doesn't need to actually parse the .udl file, it just needs to operate on a ComponentInterface. If we change uniffi_bindgen to be able to magic up a ComponentInterface directly from Rust code somehow, then each backend can get that for free by updating its dependency to the new version. That's a lot of hand-waving of course, but basically, I don't think this split would make #416 any harder than it already would be - uniffi-bindgen generate still needs to be able to slurp in a ComponentInterface definition from Rust code in either scenario.

mozilla / application-services

[Discuss] Could bindings generators live in separate crates? #4271