How should lint crates be loaded?

xFrednet commented 2 years ago

Hey :wave: I'm currently investigating ways, to load one or several lint crates from the lint driver. The current prototype loads lint crates as dynamic libraries and simply passes rust types to them. This is problematic, since Rust doesn't have a stable ABI. In fact, reading more about it, I'm surprised that it has worked flawlessly this far. I want to find a stable way to load them, as that might affect, how data is modeled and represented.

This also plays into this suggestion/discussion that lints should ideally be sandboxed.

This detailed blog about Plugins in Rust lists a few ways how this can be accomplished:

Dynamic Libraries: The lint crates are compiled to dynamic libraries and then loaded. This requires a stable ABI for types and functions. This can be accomplished, but requires some extra work. This implementation is also fast, since the lint instructions are executed natively. But it doesn't allow for Sandboxing.
WebAssembly (WASM): Lint crates would be compiled to WebAssembly, then loaded and run by a runtime. Every lint would be sandboxed by default. The main problem is, that the stable WASM interface only allows for the exchange of integers, floats, structs and enum. Pointers are not included, which makes sense, since the Sandbox is going to have its own memory addresses. This is problematic for an AST, which uses numerous references. Serializing them and then deserializing them would create a major overhead.
Scripting Language: Implement lints in a scripting language like Lua, while possible, I would like to implement the linting logic in Rust and I guess most users would also prefer that.
Compile the driver with lints: Another option would be to compile the driver on demand with the lint crates as dependencies. Lint crates would then be linked statically. This solution requires some code generation to bind the lint crates to the driver. All libraries required for the driver compilation (like nightly rustc) are required, and the additional compile time could also be noticed by the user.
rlib files with Miri: Rusts static libraries can apparently be executed by Miri. Which would allow dynamic loading. However, miri is unstable and this is more a theory and nothing with a proof of concept.

Currently, there appears to be no ideal way to add plugin support to rust projects. Every solution I found can be derived from the first four listed ideas. I like the idea of using WASM and sandboxing everything by default, like dtolnay/watt does for proc macros. However, this collides with a tree representation with several references.

Dynamic libraries seam to be ideal, with the exception that they can't be sandboxed and restrict the implementation to a stable ABI (Meaning no dyn pointers)

cc: https://github.com/rust-linting/rust-linting/issues/8

jhpratt commented 2 years ago

This was previously discussed in an extremely long issue last year. I believe consensus was to start with dylibs and keep wasm open as a possibility. This was scaled back from my original goal of always using wasm. Consensus also existed for keeping the door open to sandboxing, even if it didn't originally exist. This would simply be done by a notice that non-sandboxed behavior (like a network call) isn't guaranteed.

Somewhere in that issue I likely referenced wasm_plugin_host and wasm_plugin_guest, which should make things a bit easier. The serialization and federalization is a valid point, but I don't think the speed difference will be too noticeable — if something takes 200ms instead of 100ms, it's still sufficiently fast to the extent we shouldn't care. Wasm is also advancing quite a bit, so capabilities aren't fixed by any means.

xFrednet commented 2 years ago

I had the feeling that there was no real consensus, but I might be mistaken. Using dynamic libs is fine for me. That means less work for me, but might limit us in the future. In that case, we can keep this open as a general discussion place in case someone has a related idea or suggestion.

As a side note, I believe the difference would be more than 2x not having pointers means that every access has to go over an id -> node map which would slow down every access. Alternatively, we could optimize the layout for this, like THIR does AFAIK. But as you say, we can for now advance with dynamic loading.

xFrednet commented 1 year ago

In https://github.com/rust-marker/marker/issues/177 it was suggested to maybe not have lint crates, but #[test] like functions:

Regarding the static linking approach I think it's just a simpler way for people to use that. This is basically reminiscent of the approach that rust's test and benchmark frameworks use. The proc macro #[test] or #[bench] that collects the code that you want to run, and then a test/benchmarking harness they you need to manually invoke in your main() that provides you with the CLI over the "tests/benches" you wrote.

The comment also includes some ideas how this could be implemented

I think this is something worth considering before v1.0.0

rust-marker / design

How should lint crates be loaded? #26