rust-hosted-langs / runtimes-WG

Working group focused on language runtimes - implementing GC and concurrency in safe Rust APIs
33 stars 0 forks source link

Survey of Rust prior art #1

Open pliniker opened 6 years ago

pliniker commented 6 years ago

Languages, runtimes, libraries

These QAs need a way to be turned into something more focused, refined, useful

Garbage collection posts and papers to review

pliniker commented 6 years ago

Please suggest more Rust-hosted langs, runtimes, VMs, GCs to add to the list!

wdv4758h commented 6 years ago

I've been looking at PyPy (Python implementation with JIT and faster GC) for a while, it's a interesting project. Its key point is the RPython (Python with some restriction of too dynamic features) toolchain underneath. It's like a dynamic languages VM implementation framework, you implement your language's interpreter in RPython, the toolchain will analysis the code and generate the JIT & GC for you. The output result is decided by toolchain backend, mostly C code but it also can generate other things like JavaScript.

To me, the idea is very interesting and the result is impressive, but the limitation of RPython is not very clear, the toolchain is hard to debug and it takes a long time to generate the result. I'm wondering if this can do better with Rust, adopt the idea of generate JIT & GC from interpreter code, but leverage Rust's capability to have clearer/faster/friendlier framework.

I've seen the HolyJit project, it seems like a similar idea but in Rust way. It might be interesting to have more discuss on this topic.

BTW, Truffle might be another interesting reference too, it helps people implement their language based on JVM.

yorickpeterse commented 6 years ago

For Inko:

Does this implement a custom allocator or rely on native allocation?

Both. That is, it uses Rust's built-in allocator but managed objects (= those allocated by the language) use a custom allocator based on Immix. The code for this resides in https://gitlab.com/yorickpeterse/inko/tree/master/vm/src/immix.

what allocation strategy?

It bump allocates into 32KB blocks, which are basically arenas.

what do internal data structures look like? (tagged pointers? object headers?)

The allocator hands out pointers to structures on the heap. These pointers may be tagged in certain cases. Objects don't have headers, but blocks do (which are used for embedding some other data).

does the allocator, or a layer above it, prevent pointers outliving the lifetime of the allocator? how?

No.

what collection strategy is implemented?

Inko uses a parallel, generational garbage collector. Each lightweight process is collected separately, thus there is no global "stop the world" phase. A process that is collected is suspended for the duration of the collection.

For some Rust structures (e.g. a Rust String or Vec) we rely on Rust's Drop trait, and the collector will manually drop these objects when they go out of scope. Strings are stored as Arc<String> so they can be cheaply shared between processes.

Does the memory management allow for concurrent allocation and gc?

While different processes can run while others are being collected, the process being collected can not. Technically Immix can support concurrent operations, though it gets rather tricky since it's a moving collector. I decided to just not support this in favour of faster GC throughput.

green threads? OS threads?

Both. The VM has two thread pools for running processes (green threads pretty much): one for regular processes, and one for slow/blocking ones (e.g. processes performing IO operations). The language allows you to easily move a process from one pool to another (and back).

what mechanisms ensure heap consistency?

There's a global heap that uses synchronisation for writes, but reads are not synchronised due to the significant overhead this would access. This is OK in Inko since code won't (or at least shouldn't) write to the global heap from any process other than the first one (and Inko will soon assert that this is actually the case). Process-local heaps have no form of synchronisation since they are never accessed concurrently. Inter-process communication happens by message passing. The way this works is basically as follows:

  1. Process 1 deep copies object A into a special "mailbox heap" from process 2, producing object B
  2. When process 2 receives a message it will deep copy any messages stored in the mailbox heap into the process-local heap.

Access to the mailbox heap is synchronised. Using the above approach we need to copy data around a few times, but it ensures that the stack/heap of a process always refers to either a process-local or permanent object; never to a mailbox object. Both the mailbox and process-local heaps are collected separately and use different collection thresholds.

What is elegant and what feels awkward in Rust?

From the top of my head:

how do data structures in the hosted language interact with the Rust language?

One of the goals for Inko was to write as much in Inko itself. This means that the only Rust structures we expose are simple ones such as String and Vec. All of these are exposed in such a way that as few Rust internals leak into Inko. More complex structures such as HashMap are implemented in Inko itself and typically use a few primitive instructions (e.g. there's an instruction for hashing integers/floats/etc).

Rust structures are wrapped using a VM structure called Object. The Object structure has 3 fields (with a total size of exactly 32 bytes):

  1. A pointer to the prototype
  2. A pointer to a Rust HashMap used for storing attributes
  3. An enum containing the value (a String, Vec, etc)

When an object goes out of scope the corresponding enum (called an "object value" in the VM) is also dropped by a finaliser thread.

is mutable aliasing prevented in any way? compile-time or run-time?

No. On Rust level there's nothing stopping 2 threads from obtaining a mutable reference to the same object. In practise this won't/can't happen because memory is isolated per process.

Garbage collection posts and papers to review

I would definitely add Immix to this list since it's a very good garbage collection/allocator system.

wdv4758h commented 6 years ago

I would definitely add Immix to this list since it's a very good garbage collection/allocator system.

Totally agree, and there is RCImmix can look on. Here is another Immix related Rust repo.

brendanzab commented 6 years ago

Oh, I just remembered that Eve was working on a Rust runtime for their Datalog-style language before the project was discontinued. Might be interesting to check out: https://github.com/witheve/eve-native

brendanzab commented 6 years ago

I posted a link to this repo and thread on the Eve Slack. But not sure if anyone is still looking at it. cc. @cmontella @joshuafcole - not sure if you have anything to add on your experiences using Rust to host the Eve language?

cmontella commented 6 years ago

Hi Brendan. Yeah, I enjoyed writing Eve in Rust, moreso than the other host languages we used: JS, TS, Go, Lua. I decided to work on a new language project now that Eve is done. It's another datalog variant, and the backend will be in Rust again. https://gitlab.com/cmontella/mech

What I like most about Rust is that there are no segfaults -- I'm used to programming in C and C++, and those languages are a minefield for me. So with Rust I get safety and a low level interface.

I haven't had time to read through this thread, but I'll take some time later and try to be more specific. Let me know if you have any questions in the meantime.

brendanzab commented 6 years ago

Cool, thanks for your feedback! Might be nice to have you around then! There's a post on the internals forum, and a Gitter channel if you want to stay in touch!

jolby commented 6 years ago

The folks who implemented immix in Rust also are working on a rust-vm (Zebu):

http://microvm.github.io/

Zebu or mu-impl-fast project page:

https://gitlab.anu.edu.au/mu/mu-impl-fast

mamcx commented 4 years ago

Under the https://nanopass.org idea, go toast -> execute can be changed at will. So, instead of a coupling like:

parsing -> AST ->  execute
//Change minds, rewrite a lot:
parsing -> AST ->  gen bytecode -> parsing bytecode -> execute

Everything is "just" chain another pass:

steps = [parsing, AST, execute]
steps.iter....
//Change minds, just add more specialization
steps = [parsing, AST,  [gen bytecode, parsing bytecode ], execute]

How feasible a "serde like" framework, where is possible to chain steps at will, and if somebody wanna add or remove steps can? This could make things like support IDES, debuggers, transpilers, etc easier? Making steps modular?

brendanzab commented 4 years ago

@mamcx are you asking if something like Nanopass would be possible to be implemented as a library for Rust? Because I would really love to see something like that. Making multi-pass compilers is a bit of a chore in Rust at the moment!

mamcx commented 4 years ago

Yes, is the idea to think if something like that can be done. I see how much success is serde and the Into/From traits (my favorite thing of rust) and wonder how close to it can be but for compilers.

Wonderful could be to get "blocks" of compilers/interpreters:

trait FromPass {
}

impl FromPass for Lexer {}
impl FromPass<Lexer> for LossLessAST {}
impl FromPass<LossLessAST> for SugaredAST  {}

I remember exist some new backend alike LLVM but much more plugable (but forget the name) in C++ and feel like great idea there (I think the main goal was easy auto-vectorization). Also exist https://scala-lms.github.io that at least for DSL feel very easy to use.

Still the point is that exist so many things that people want. For example, I will happily ignore all complexity of bytecodes if I can get AST to perform good enough. So a bunch of composable crates will be wonderful.

P.D: Now that I think a little more, we already have something close: Middlewares on web frameworks, alike actix. Request/Response is a model that is close to the basic of unix pipelines. Threading a global "Lang context" and allow to attach extras like "Code generators", "JIT engine", etc... and this way, everyone can mix/match at will, and yet, allow to provide common middlewares like logging, pretty printing, language servers, etc...

pliniker commented 4 years ago

@mamcx 🤔 I’ve had similar thoughts about serde and compiler passes recently - mostly thinking about it in the LSP context or experimenting with multiple syntaxes. I wasn’t aware of Nanopass but I think this is a _very__ interesting architectural idea.

mamcx commented 4 years ago

Today on hn I found this link about chez scheme:

https://news.ycombinator.com/item?id=15156027

So at least exist a validated implementation on the idea.

BTW: I think is better to move this to another issue?

pliniker commented 4 years ago

@mamcx 👍 sounds good, feel free to open it and write up the description