Where does one instance of the Rust Abstract Machine end and another one start?

rust-lang / unsafe-code-guidelines

Forum for discussion about what unsafe code can and can't do

https://rust-lang.github.io/unsafe-code-guidelines

Apache License 2.0

670 stars 58 forks source link

Where does one instance of the Rust Abstract Machine end and another one start? #543

Open VorpalBlade opened 3 weeks ago

VorpalBlade commented 3 weeks ago

If you have two separate Rust programs that are both running under an OS, it seems pretty clear they are two separate instances of the Rust AM (Abstract Machine).

If you have two threads in the same program, there is one instance of the Rust AM my understanding is that there is a single instance of the AM.

However, what about two separate programs, but there is shared memory between then (e.g. using mmap for IPC)? It has to be two separate instances to not be UB according to this reply by Ralph Jung on IRLO. The shared memory will usually not be at the same address in both processes. Since we presumably want shared memory to NOT be UB they must be separate instances of the AM.

So the "border" is somewhere between "threads sharing all of memory" and "memory mapped between processes". But where exactly?

Let's consider some hypothetical scenarios to see where they would land:

Two processes with shared memory, but now I start a thread set up to have its stack in the shared memory.
The above scenario (1) but I then use a hypothetical IPC-based scoped-threads to start threads in the other process that borrows data from my stack in the shared memory (in this case both processes would have to map the shared memory at the same virtual address for the pointers to be valid in both programs).
A kernel written in Rust, reading/writing to user memory of a program written in Rust. According to the same reply by Ralph Jung linked above, we cannot have one AM, as the mappings aren't the same. The kernel has a superset of the user space mappings available to it.
An embedded system where you have an MPU (memory protection unit) but not an MMU. Here you can protect memory regions, but you always have physical addresses. This can be used to implement a basic OS, but all code is built and distributed together as one program image. Are there multiple instances of the AM here when the memory protection changes that lets you access different parts of memory depending on which thread you are currently executing in? A concrete example is hubris by Oxide Computer.
A multi-core kernel written in Rust (for a normal desktop computer), As it executes syscalls on behalf of the user space, it will have user space memory mappings corresponding to whatever the currently executing user space program on that specific CPU core. That means there can concurrently be two system calls executing on different cores, each with their own user space mappings. An example of this is the Linux kernel, but I would expect this to be an universal thing. Is the kernel code on each core a separate instance of the AM then? Since the kernel stack lives in kernel space (which is all shared), case 2 above also applies.

There are many other edge cases that you could come up with. I would ideally like a clear definition of what exactly constitutes an instance of the Rust Abstract Machine vs several, as this heavily impacts what sort of virtual memory tricks are valid in Rust, in particular around mmap-accelerated ring buffers, embedded systems, and kernels. Or is this still an open question, and if so what parts are decided?

I looked through the Rust reference but wasn't able to find any definition.

CAD97 commented 3 weeks ago

As a not-so-helpful but reasonably true answer to the title question: the AM ends wherever you define it to end. The Implementation defines some manner in which the Concrete Machine is used in order to emulate the AM. For each step of the AM, the Implementation maps that AM step into whatever CM operations need to happen, and then maps whatever CM state changes back into AM state changes as established by the manner in which the CM emulates the AM.

When communicating with some process on the CM which is not known to this AM, all state sharing is done at the level of the shared CM semantics. (For LTO, this would be LLVM-IR, and not the target physical machine!) The manner in which CM effects from another process are translated into AM effects is again defined by the Implementation manner of lowering the AM to the CM semantics.

Things of course get more interesting when concurrency is involved, since operations which are atomic on the AM may not be atomic on the CM (e.g. atomic access lowering to nonatomic access plus appropriate fencing). But this basic framework is the shape by which all effects that don't originate within the AM get translated into the proper AM effects; it's the Implementation which chooses and defines what means what.

Of course we do want to do better than just saying "it's implementation defined behavior" where we can do so, so I think this is still a valid question to ask.

At a minimum, Rust code using the same instance of std needs to be in the same AM world, and code using different instances of std exist in different AM worlds. (Rule of thumb: are std's global places shared or distinct?) Whether this would be consistently initialized correctly in the different proposed scenarios is a different question.

Lokathor commented 3 weeks ago

I think it might be... well, "simpler", than all those scenarios above. To me, the key element is the Ralf quote that was linked: A given AM needs to have a single address space. So, any time there's more than one address space, for any reason, that's more than one AM. This seems like a helpful point to focus on, because then we don't have to ask about 2 user processes, or a user process and the kernel, or two rust processes sharing memory, or ten other edge cases like that. If it's separate address spaces, it's separate abstract machines.

It might also be the case that two things sharing the same address space are still separate Rust AM instances somehow, but at least it narrows how many potential cases we need to investigate.

VorpalBlade commented 3 weeks ago

At a minimum, Rust code using the same instance of std needs to be in the same AM world, and code using different instances of std exist in different AM worlds. (Rule of thumb: are std's global places shared or distinct?) Whether this would be consistently initialized correctly in the different proposed scenarios is a different question.

That makes sense, but is not as helpful as it could be for no-std code (which is one of the things I'm really interested in).

A given AM needs to have a single address space. So, any time there's more than one address space, for any reason, that's more than one AM.

This view also makes sense, but there are some systems where this creates issues:

The dual core RP2040 (aka Pi Pico) has some MMIO address space that is core local. This contains for example a peripheral for integer division (which is lacking as a normal instruction), as well as hardware queues and mutexes for communication between the cores. The actual RAM however is all shared (and there is no MMU). Maybe it is a non-issue since for MMIO you would use volatile anyway.
Historically there have been many systems with asymmetric multiprocessing, where not all cores have access to all memory. The most well known example is probably the Cell processor in the PlayStation 3, where the SPE vector cores had some of their own local fast RAM. I believe there has also been early Unix workstations that had a User CPU and a IO offload CPU, again with different partially overlapping physical memory mappings. Modern mobile SOCs with application cores and helper cores (image processing, 5G modem, etc) are also common.

This shows that there is a wide grey scale between "share everything" and "share nothing" out there in the wild. We could draw the line at any point in between, and the two lines you proposed both seem reasonable. But what are the side effects / fallout of picking any one of those lines?

In particular if they become separate AMs once there is any memory that isn't shared between the threads of execution, what happens if you still share std? Does that become UB?

That would render some of those embedded systems such as the RP2040 problematic as you cannot change the memory map, you get what you get. It is a no-std platform though, so maybe thst is its saving grace.

Lokathor commented 3 weeks ago

It's not about sharing std specifically, it's about the assumptions that Rust is built on at the language level. We kinda assume, for example, that shared references can be sent to another thread, and that your threads can be executed on any core. As soon as cores become limited in what they do, things get dicey. Not that it can't work, but you've gotta be very careful, and mindful of what you're doing. This isn't entirely unknown to the Rust ecosystem, because some OS APIs are restricted in what threads they can do on what thread, but it does make a lot of unsafe code be needed if a safe API can't be wrapped around the situation.

bjorn3 commented 3 weeks ago

Historically there have been many systems with asymmetric multiprocessing, where not all cores have access to all memory. The most well known example is probably the Cell processor in the PlayStation 3, where the SPE vector cores had some of their own local fast RAM.

The SPE cores run a different instruction set from the main PPE core. As such the best way to model this is probably different processes (and thus different AMs) on different devices which happen share a part of their address space the same way a CPU and GPU share memory. Modeling it as threads would imply that you can safely migrate threads between SPEs as well as between an SPE and PPE, which is not the case due to some memory being private to an SPE and due to the different instruction sets. The different instruction sets also imply that code for the SPE and PPE has to be compiled separately with different copies of the standard library, while everything in a single AM can be compiled together and has to share a single copy of the standard library.

RalfJung commented 1 week ago

A given AM needs to have a single address space

That is one constraint, yes. LLVM has support for multiple address spaces, so there is something that could possibly be done here... but I don't know if LLVM has support for pointers that share their meaning when being sent across thread boundaries.

Another constraint is that if inlining happens, both sides of this (caller and callee) need to be in the same AM.

everything in a single AM can be compiled together and has to share a single copy of the standard library.

I don't think that's a hard constraint. The standard library is also "just code" for the AM, after all.

bjorn3 commented 1 week ago

I don't think that's a hard constraint. The standard library is also "just code" for the AM, after all.

The AM has several callbacks into libcore/libstd like lang items for initiating panics, aborting when unwinding out of a function that may not unwind and defining the global allocator. Only a single instance of each of these callbacks can exist without causing problems I did assume, which in turn requires only a single copy of libcore and libstd to exist in the whole AM.

RalfJung commented 1 week ago

I think those lead to library-level UB first. Some of that may later lead to language-level UB, but in terms of the definition of the AM, it would be really strange to even mention the standard library.

bjorn3 commented 1 week ago

I think those lead to library-level UB first.

Several lang items are directly invoked by the compiler, so I don't think it could be defined as library-level UB. It would be UB the very instant said lang item needs to be used as the compiler doesn't know which definition to pick.

but in terms of the definition of the AM, it would be really strange to even mention the standard library.

Rather than explicitly mentioning the standard library, you could say that each lang item may only be defined once inside the AM. Rustc already gives an error when rustc knows that both crates that define a lang item get linked together.

RalfJung commented 1 week ago

Ah yeah, for lang item functions indeed those must be unique across the AM. It should be pretty hard to violate that, given the compile-time check.

bjorn3 commented 1 week ago

It should be pretty hard to violate that, given the compile-time check.

For rust dylibs (as opposed to cdylibs) it is easy to accidentally violate unfortunately when using dlopen as statically linking libstd into dylibs is the default.

RalfJung commented 1 week ago

Hm... we also do have explicit checks for "unwinding out of other instances of the Rust stdlib in this process". So It seems to me that having multiple std instances in one AM actually is explicitly supported?

bjorn3 commented 1 week ago

We support multiple copies of the standard library in a single process, but when doing so all interaction between code that uses one instance of the standard library and code that uses another instance should go through the C ABI rather than the Rust ABI, so any interfacing would effectively be FFI between two separate AM instances, the same way that using C and Rust in a single process would be FFI between a C and a Rust AM instance.

VorpalBlade commented 1 week ago

For rust dylibs (as opposed to cdylibs) it is easy to accidentally violate unfortunately when using dlopen as statically linking libstd into dylibs is the default.

Hm, does this mean that each cdylib (if there are several in the current process) is a separate instance of the Rust AM? That would mean that you could also have multiple instances of the Rust AM in the same process (with the same memory map), a case that wasn't clear to me. That does make sense when I think about it, since there is no guarantee that those cdylibs even come from the same Rust version. (And if you are doing funky like emulating another OS or architecture (e.g. like Wine on Linux does), that would also involve multiple AMs.)

It seems to me that the edges of the AM is then not very well defined (or at least not well documented) in either direction. Having clear documentation on this would be very useful for embedded and OS developers especially, but also systems level programming in general.

It would be good to not just have the formal reference, but also some sort of "commentary" of what the practical implications of said definition are (perhaps in the nomicon?). What should you practically do as a kernel or embedded developer who needs to deal with multiple memory mappings? Are there any common pitfalls to avoid?

Diggsey commented 1 week ago

The AM isn't a real thing, it's just a tool for reasoning about the code, so I don't think there's one answer to this question. There exist things that can be reasoned about separately, things that can be reasoned about in one AM, and things that can be reasoned about both separately or as one AM.

Two pieces of code can interact at (for lack of better terminology) the machine level and/or the AM level.

The AM provides some guarantees about how code in the AM maps onto the machine level. Things like FFI, volatiles, etc. have quite strict mappings onto the machine level - that is, the AM gives you lots of guarantees about the state of the machine at those points.

If the interaction between two code units only relies on things that are constrained at the machine level, then you can consider them separate AMs. If there is even one interaction which is not, then you would have to consider them part of the same AM.

Further, I think it's generally safe to "merge" two AMs - that is, if they can be considered separate AMs and are sound when considered that way, then they must also be sound when considered as one AM. The exception would be if the programs combined would do something that can't be represented in the AM currently (such as multiple address spaces).

digama0 commented 1 week ago

The AM isn't a real thing, it's just a tool for reasoning about the code, so I don't think there's one answer to this question. There exist things that can be reasoned about separately, things that can be reasoned about in one AM, and things that can be reasoned about both separately or as one AM.

I disagree. The AM is not a reasoning tool, it is a specification tool, and as such there is a unique answer to the question (setting aside for the moment the specifics of what that answer is). Rust as a language is defined in terms of the operational semantics of the AM, and it is necessary for t-opsem to decide on the answers to questions such as this for rust code to have a well defined meaning. Looking at it from the perspective of t-opsem itself, it can seem like the boundaries are more fluid, since we want to make a definition which satisfies the constraints of usage in real code, but at the end of the day the definition can only be one thing, and not different things for different purposes.

Diggsey commented 1 week ago

The AM is not a reasoning tool, it is a specification tool, and as such there is a unique answer to the question

That doesn't follow at all?

I don't think there is a unique answer to this question... If I make two rust crates and only communicate between them via FFI, then clearly it's valid to consider them separate AMs, since each one cannot distinguish whether it's interoperating with other Rust code or say assembly where there is no AM at all, let alone a Rust AM. However, it's clearly also valid to treat them as the same AM.

I don't think that's a problem for specifying the language: we just need to make sure we get the same semantics in both cases for programs where either answer is valid.

digama0 commented 1 week ago

I don't think there is a unique answer to this question... If I make two rust crates and only communicate between them via FFI, then clearly it's valid to consider them separate AMs, since each one cannot distinguish whether it's interoperating with other Rust code or say assembly where there is no AM at all, let alone a Rust AM. However, it's clearly also valid to treat them as the same AM.

It does make a difference. If they are two AMs, then optimizing across the FFI barrier is not allowed. In the scenario you describe, I would say that they are two AMs unless you enable some kind of LTO.

On the other hand, I think the question is also a bit ill-posed in the sense that the whole concept of multiple AMs running at once isn't really a thing? The specification only really admits one AM talking to an environment, and FFI is not really a separate AM, it's the same AM running on both sides of the barrier, you just can't really see what it's doing when it goes off and executes the FFI function. That's the way I conceptualize the idea that an FFI function "can only do what rust code can do" - it's the same single AM either way, you just lose the play-by-play simulation for a little while before it is reasserted when the FFI or assembly block gets back.

In that sense, what we are talking about as "two AMs" is really just two views on the same execution of the same AM, where different parts of it are visible and available for optimization in each case.

RalfJung commented 1 week ago

I think there can be cases where the same real-world situation can be accurately described by multiple different models. For instance, when two Rust no_std cdylibs are linked together without LTO but there's no other funkiness going on, that could be one AM or multiple -- it's really just a modelling choice.

What's relevant for unsafe code authors is that there must be some accurate way of modeling what they are doing. It doesn't have to be the only way.

On the other hand, I think the question is also a bit ill-posed in the sense that the whole concept of multiple AMs running at once isn't really a thing? The specification only really admits one AM talking to an environment, and FFI is not really a separate AM, it's the same AM running on both sides of the barrier, you just can't really see what it's doing when it goes off and executes the FFI function.

I would say it's two AMs. When you make an FFI call (assuming no LTO), you have to axiomatically provide an AM state transition that describes how this call affects the AM-visible machine state (and that same transition must be possible with pure Rust code), and it is your responsibility to ensure that the actual assembly-level behavior matches the AM transition. You can have two AMs "linked" to each other that way where the steps on one AM are "axiomatic transitions" for the other. That doesn't mean they are one AM -- each AM just has the other one "in its environment".

That's the way I conceptualize the idea that an FFI function "can only do what rust code can do" - it's the same single AM either way, you just lose the play-by-play simulation for a little while before it is reasserted when the FFI or assembly block gets back.

This doesn't make sense to me, since the other side of the FFI barrier could be written in a different language. The fact that it happens to also be Rust doesn't make a fundamental difference.