Closed cramertj closed 1 year ago
If an lvalue is accessed through only volatile loads, LLVM will not add accesses that aren't volatile.
@ubsan Awesome! Is that documented somewhere?
I've not seen this guarantee stated anywhere in LLVM docs. It seems reasonable on its own but the dereferencable
attribute may very well throw a wrench in it. Without that attribute, LLVM may not assume it can insert loads from a pointer, so it won't insert any unless it sees a pre-existing load, which can be extended (with a bit of care) to take the volatile-ness of the existing loads into account and not take pre-existing volatile loads as permission to insert new loads. On the other hand, by definition dereferencable
means it's OK to insert any loads from the address anywhere.
While one may be tempted to say "OK but don't do that if there are volatile accesses", that's not really possible. The transformation that wants to insert a load may not be able to see the volatile accesses (e.g., because they are in another function), or there may not even be any accesses at all to the location (e.g., if the source program creates a &VolatileCell
but doesn't use it).
This is not an issue for Clang compiling C code handling pointers-to-volatile because C pointers are not annotated dereferencable
. When compiling C++ code that handles references-to-volatile, Clang does insert dereferencable
as it does for all other references (https://godbolt.org/z/z6X8Y6) but dereferencable
-on-references is also unsound in other ways (see e.g. https://lists.llvm.org/pipermail/llvm-dev/2018-July/124555.html) so that doesn't count for much.
@cramertj shoot! I may absolutely be wrong. I am pretty sure I saw that before, but I may be imagining it; or I may have seen it in a talk. I'll see if I can find it -.-
For reference, this is similar to the old, long, thread https://internals.rust-lang.org/t/volatile-and-sensitive-memory/3188/46.
My conclusion from the previous thread is that Rust should add a new type modifier (maybe two, one for reads and one for writes) similar to volatile
that guarantees the exactly-once behavior that isn't conflated with unsafe
. That is, the language should ensure one shouldn't need to use unsafe
at all, even in the implementation of VolatileCell
, to get the exactly-once semantics.
@briansmith Thanks for the link! Yeah, that is similar. The bit I'm especially interested in is @nikomatsakis's conclusion here:
The key point is that the compiler will not randomly introduce reads of an &T – it will only introduce reads that it can prove may happen at some point in the future. This is (I believe) true even with the derefenceable attribute.
i've had several experts tell me this isn't true, and that speculative reads may be introduced even when no normal reads could possibly occur. In fact, there are several passes that almost definitely do today -- ASAN and structure splitting, for example. However, this would mean that the guarantee that you and I are asking for doesn't exist even in C/C++ today, and that all non-ASM MMIO code is busted in this way. If that's true, it might be necessary to introduce something new into LLVM to mark pointers with an "exactly once" property.
However, this would mean that the guarantee that you and I are asking for doesn't exist even in C/C++ today
C11 defines "access" as "execution-time action〉 to read or modify the value of an object" and says:
An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine, as described in 5.1.2.3. Furthermore, at every sequence point the value last stored in the object shall agree with that prescribed by the abstract machine, except as modified by the unknown factors mentioned previously. What constitutes an access to an object that has volatile-qualified type is implementation-defined.
A volatile declaration may be used to describe an object corresponding to a memory-mapped input/output port or an object accessed by an asynchronously interrupting function. Actions on objects so declared shall not be ‘‘optimized out’’ by an implementation or reordered except as permitted by the rules for evaluating expressions.
So, I think that C does provide quite a strong guarantee as long as you only refer to this memory using volatile
-qualified types.
My understanding is that LLVM has a pass that erases the volitile
modifier from objects and attempts to replace all accesses to them with "volatile accesses". After that pass LLVM can optimize all non-volatile accesses as it pleases but it must never introduce any new non-volatile accesses, in order for this implementation strategy for C volatile
semantics to be correct. In other words, in later passes of the compiler, unless/until an object has been accessed using a non-volatile access, the pass must assume the object is volatile
. Obviously, this restriction only applies to passes that are invoked by clang/clang++ when compiling C/C++ code though; passes that aren't used by the C/C++ compilers may not preserve those semantics.
As far as Rust is concerned, there's no volatile
qualifier so currently the only workable semantics is to assume that every object is volatile
and every reference/pointer is volatile
as defined by C11 unless/until a non-volatile access/assignment is made; i.e. act like those latter passes of LLVM are supposed to act. (Whether they do or do not behave that way, I don't know.) This seems way too pessimistic to me; it seems workable now because the only Rust compiler publicly available is based on LLVM that already does that, but I'd hate to restrict future Rust compilers to LLVM's implementation strategy. Instead, I'd prefer to introduce volatile
or similar ASAP into Rust's type system.
This is not an issue for Clang compiling C code handling pointers-to-volatile because C pointers are not annotated dereferencable
Are Rust pointers annotated dereferencable
? I don't think that one can use Rust references to interact with MMIO [0], but using pointers with volatile read / writes should be ok if these Rust pointers are not marked dereferencable
.
[0] MMIO is always mutably aliased, so if you create a &
or a &mut
to it, you are essentially violating the invariants of Rust references.
@gnzlbg
[0] MMIO is always mutably aliased, so if you create a & or a &mut to it, you are essentially violating the invariants of Rust references.
The VolatileCell
wrapper uses UnsafeCell
. Rust shouldn't ever assume that mutation isn't occurring behind a reference to an UnsafeCell
. Rust references are annotated dereferenceable
, but pointers are not.
@briansmith
A volatile declaration may be used to describe an object corresponding to a memory-mapped input/output port or an object accessed by an asynchronously interrupting function. Actions on objects so declared shall not be ‘‘optimized out’’ by an implementation or reordered except as permitted by the rules for evaluating expressions.
I don't see where in here it makes any guarantees about not inserting extra reads or writes. It says they won't be optimized out, but not that new accesses won't be inserted.
I don't see where in here it makes any guarantees about not inserting extra reads or writes. It says they won't be optimized out, but not that new accesses won't be inserted.
I think it's implied by "Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine, as described in 5.1.2.3." In particular, IIRC (I'm not going to re-read it now), section 5.1.2.3 at least implies that each expression is evaluated exactly once.
@briansmith Perhaps? That wasn't my reading of it, but if you can find more support for that, I'd be glad to hear that was the case. That doesn't appear to be the behavior implemented by LLVM-- @hfinkel on the llvm-dev list said that "I don't believe that the LangRef provides that guarantee" and pointed out that llvm::FindAvailablePtrLoadStore
and llvm::isSafeToLoadUnconditionally
don't appear to check for volatility.
They also said in the same email that LLVM probably should provide such a guarantee (citing that volatile accesses usually lower to the same instruction as normal ones, so adding non-volatile accesses is in many ways like adding another volatile access). FWIW, I agree. Let's try to get LLVM to guarantee that.
Dereferenceable is still incompatible with MMIO, though, for reasons I outlined earlier and as also confirmed on llvm-dev. That indicates we either need to find alternatives to &VolatileCell<T>
(e.g. VolatilePtr<T>
wrapping a raw pointer) and try to mitigate the ergonomics problems that has, or special case VolatileCell
or a similar lower level primitive (which would be to VolatileCell
as UnsafeCell
is to Cell
).
Dereferenceable is still incompatible with MMIO, though, for reasons I outlined earlier and as also confirmed on llvm-dev.
That's still not clear to me-- that point was made based on the idea that MMIO didn't count as allocated memory, which was contradicted by the second point. I'd imagine that the points that check and care about "defeferenceable" would be the same paths that would want to check for !volatile (namely, isSafeToLoadUnconditionally).
OK, fair, Eli Friedman's argument is not the same as mine and doesn't really convince me either (I'm with Hal, MMIO memory is allocated, just not by LLVM). However, re: this:
I'd imagine that the points that check and care about "defeferenceable" would be the same paths that would want to check for !volatile (namely, isSafeToLoadUnconditionally).
Consider for example the function
fn maybe_use_volatile(x: &VolatileCell<u8>, flag: bool) {
if flag {
use_volatile(x);
}
}
This maps to LLVM IR which receives a dereferenceable
pointer argument and conditionally passes it to some other function. There is no volatile access in sight and the attribute says introducing (non-volatile) loads is fine. So it should clearly be allowed. Yet, if x
refers to a MMIO address, clearly we don't want extra memory accesses to be introduced!
To resolve this contradiction, one could weaken the meaning of dereferenceable
to only allow extra loads if it is proven that there are non-volatile accesses (or no accesses at all). But that makes the attribute basically useless for its intended purpose! The only reasonable solution to this contradiction is to not add dereferenceable to pointers where you don't want non-volatile loads to be introduced.
They also said in the same email that LLVM probably should provide such a guarantee (citing that volatile accesses usually lower to the same instruction as normal ones, so adding non-volatile accesses is in many ways like adding another volatile access). FWIW, I agree. Let's try to get LLVM to guarantee that.
Ack. FWIW, I think MMIO in C is just as broken without such a guarantee. And while they are at it, let's also extend the "number of memory accesses doesn't change" to "number and size doesn't change" -- @cramertj suggested in a meeting earlier that there are cases where this actually happens? Or did I misunderstand?
Basically, I think the way to think about volatile accesses is as follows: A *x = val
really is a call to write_volatile(x, $SIZE, val)
, and the compiler doesn't exactly know what that function does. It can make some assumptions ("doesn't mutate any memory I know about that is not aliased with x..x+size
"), but no more than that. It's otherwise an unknown function call. Of course you cannot just duplicate or remove that, and of course this doesn't let you make any assumption about x
being dereferencable. For all you know, write_volatile
might send your data over the network and not touch memory at all, using x
as just some kind of tag or so (or maybe a memory location on a different machine?).
This also clearly shows, in my view, that volatile
is a property of the access, not the memory location or variable or any such thing. Rust got that right, IMO.
Now, to the unfortunate side of things: dereferencable
. I agree that every reasonable interpretation of that attribute is incompatible with MMIO, as it allows adding spurious loads. That's its entire point. And because &UnsafeCell
is still dereferencable
, you cannot use that for MMIO; you have to use raw pointers.
However, in a recent discussion of internals it was discovered that the way we use &AtomicUsize
in Arc
is already incompatible with dereferencable
. Arc
doesn't actually ensure that the memory lives for as long as the shared reference exists. Hence I proposed that UnsafeCell
should not just remove noalias
, it should also remove dereferencable
. Now with MMIO we actually have a second good argument for doing that. So maybe it is worth shaping this into an RFC or so such that we can collect arguments for why one might not want to do this?
This thread doesn't actually look so bleak to me, @cramertj -- why did you say "MMIO requires inline assembly"?^^
Losing dereferenceable
on &Cell<T>
and the like would be quite unfortunate. They already lose noalias
, but that is inherent for the common use cases of Cell
, while dereferenceable
is completely justified for Cell
. But perhaps that can be solved with an opt-in attribute or marker trait or something like that, and with that in mind "UnsafeCell
removes dereferenceable
" seems reasonable to me.
@RalfJung
This thread doesn't actually look so bleak to me, @cramertj -- why did you say "MMIO requires inline assembly"?^^
Because, as you say,
I think MMIO in C is just as broken without such a guarantee.
There's no indication of a correct and spec-compliant way to get exactly-once access in either C, C++, or Rust, so it seems developers needing this behavior today are stuck without inline assembly. If the LLVM devs and the broader C community decide that they want "volatile" to mean exactly-once-access (which I think is the only reasonable behavior for it to have) then GCC, Clang, and the C and C++ references all need to be updated. I think this is what should happen, but I don't presume to know how this will go across in those communities.
and while they are at it, let's also extend the "number of memory accesses doesn't change" to "number and size doesn't change" -- @cramertj suggested in a meeting earlier that there are cases where this actually happens? Or did I misunderstand?
Nah, sorry for the confusion-- this is an issue with trying to do non-volatile reads/writes to uncached memory. Uncached memory shouldn't necessarily require volatile, but on ARM a fault will occur if you attempt an unaligned access. This example,l:'5',n:'0',o:'C%2B%2B+source+%231',t:'0')),k:54.84078481826954,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:arm64g630,filters:(b:'0',binary:'0',commentOnly:'1',demangle:'0',directives:'0',execute:'1',intel:'0',trim:'1'),lang:c%2B%2B,libs:!(),options:'-O2',source:1),l:'5',n:'0',o:'ARM64+gcc+6.3.0+(linux)+(Editor+%231,+Compiler+%231)+C%2B%2B',t:'0')),k:45.15921518173047,l:'4',m:100,n:'0',o:'',s:0,t:'0')),l:'2',m:59.78875381949496,n:'0',o:'',t:'0'),(g:!((h:output,i:(compiler:1,editor:1,wrap:'1'),l:'5',n:'0',o:'%231+with+ARM64+gcc+6.3.0+(linux)',t:'0')),header:(),l:'4',m:40.21124618050504,n:'0',o:'',s:0,t:'0')),l:'3',n:'0',o:'',t:'0')),version:4) (it's easy to generate such examples) shows how a "normal" load can result in unaligned accesses. It'd be nice to have a non-volatile way to do a load that guaranteed no unaligned accesses.
There's also another orthogonal Fuchsia-specific reason that we're probably going to wind up using inline ASM for MMIO, which is that Linux KVM doesn't implement load/store instructions to MMIO addresses when using writeback (e.g. str w0, [x1], #4
). This is just a bug in Linux KVM, but the fact that it's missing means we need a way to restrict the set of instructions we emit for accessing MMIO-- the easiest way to do this is inline ASM.
@RalfJung
Hence I proposed that UnsafeCell should not just remove noalias, it should also remove dereferencable.
+1. This makes sense to me. It might well be that, in the future, if we want to be more "fine grained" about this, we might want to add different "types" of UnsafeCell
s to the language, offering different guarantees. But for the time being, let's try to see how far we can get with just a single UnsafeCell
that does the right thing. Unsing different types of UnsafeCells
with different guarantees properly would be an extremely finicky thing.
@cramertj we should be able to interact with MMIO without having to use inline assembly. MMIO might not be super common, but I think it is common enough and involves enough pitfalls, that we should do our best to make it as simply as possible to use it correctly.
Basically, I think the way to think about volatile accesses is as follows: A
*x = val
really is a call towrite_volatile(x, $SIZE, val)
, and the compiler doesn't exactly know what that function does. It can make some assumptions ("doesn't mutate any memory I know about that is not aliased withx..x+size
"), but no more than that. It's otherwise an unknown function call. Of course you cannot just duplicate or remove that, and of course this doesn't let you make any assumption aboutx
being dereferencable.
Take the case where there are no calls to write_volatile
or read_volatile
. Then what can the Rust
This also clearly shows, in my view, that
volatile
is a property of the access, not the memory location or variable or any such thing.
I disagree. This doesn't match the mental model of the programmer who is trying to access the memory-mapped I/O region (a particular object) in the correct way. When I have a volatile object then I want to enforce all accesses to the object to have volatile semantics and to prevent any non-volatile accesses. That means in particular I don't want to allow creation of a non-volatile reference/pointer to the volatile object such that the non-volatile reference/pointer could be passed to some function that would then operate on the object with non-volatile accesses. I want the compiler to understand that that object is immovable (since moves are non-volatile accesses) and can't be Copy
(for the same reason). In Rust the normal way of enforcing these kinds of constraints is through the type system: Define a suitably-constrained type and then ensure that the constrained object has that type.
Rust got that right, IMO.
I think it's too early to say that until there's a spec for what Rust is trying to specify, and also until there's at least one implementation of the semantics that Rust specifies. In particular, if LLVM doesn't guarantee the exactly-once behavior then we'd have to see how much performance loss (if any) there would be if/when LLVM is changed to do that. Right now it's not even clear how we would verify that some change to LLVM is sufficient to guarantee those semantics with its current design. It seems to me it would be easier to verify that LLVM is doing the correct thing if it too internally switched to C-like semantics where volatile
is a property of the object and the pointers to the object.
@gnzlbg
@cramertj we should be able to interact with MMIO without having to use inline assembly. MMIO might not be super common, but I think it is common enough and involves enough pitfalls, that we should do our best to make it as simply as possible to use it correctly.
Oh totally! I want that too. I'm just disappointed that the current compilers and specs don't seem to be set up to allow this.
@briansmith
In Rust the normal way of enforcing these kinds of constraints is through the type system: Define a suitably-constrained type and then ensure that the constrained object has that type.
Certainly that is a better abstraction for MMIO, but adding a whole other storage category to the language is a huge step and can significantly complicate the type system (as volatile
in C and C++ does). If a library type can achieve all this with smaller additions to the core language (such as {read,write}_volatile
and &UnsafeCell
not allowing spurious reads and writes), that's preferable.
It seems to me it would be easier to verify that LLVM is doing the correct thing if it too internally switched to C-like semantics where volatile is a property of the object and the pointers to the object.
I am confident that will never happen. If anything, LLVM is moving even more from putting things on the pointer to instead putting them on memory accesses through those pointers. For example, pointee types are slated for removal (instead of T*
for various T
there should be just one pointer
type) -- and for good reason, typed memory is an obstacle at the level of abstraction where its IR lives.
If it's a property of the allocation, that also means you can't do a volatile access to an object not allocated as such, which is sometimes useful.
Furthermore, I've seen nothing indicating that a storage classifier instead of a tag on accesses is necessary or even helpful for making and maintaining the exactly-once guarantee. The omission of such a guarantee could just as easily happen if the LangRef was instead describing loads and stores through pointers-to-volatile, and in terms of implementation there is just as much tension as sharing code (which then can lead to incorrectly applying non-volatile reasoning to code using volatile) between volatile T*
and T*
as there is for sharing code between load volatile
and load
.
@cramertj
There's no indication of a correct and spec-compliant way to get exactly-once access in either C, C++, or Rust, so it seems developers needing this behavior today are stuck without inline assembly.
It is my interpretation that the behavior you want was always the intention of the spec, just so far they did not manage to write down what they actually meant. Also, there are people writing MMIO in C -- AFAIK the Linux kernel is full of it -- so in practice, at least for GCC, this seems to work. Or not?
I don't know where to lobby to get the C/C++ spec improved, but we can start with LLVM and they seem to agree.
@rkruppe
Losing dereferenceable on &Cell
and the like would be quite unfortunate. They already lose noalias, but that is inherent for the common use cases of Cell, while dereferenceable is completely justified for Cell.
I think it is as justified for Cell
as it is for AtomicUisze
-- in many cases it is, sometimes it is not. Concurrency is not special, async programming has all the same issues. Basically, in the current situation, the rules are "UnsafeCell
lets you do concurrent mutation except for deallocation" (seeing deallocation as a form of mutation, albeit a rather drastic one). That can be seen as an odd exception, and we know of at least one case where someone didn't think that through (and AFAIK whoever wrote Arc
was deeply familiar with Rust and why UnsafeCell
is a thing, so that's not a good sign).
I am pretty sure one could reproduce the Arc
issue in a sequential context with a Cell<usize>
. There would be code like
async fn fetch_sub_yield(x: &Cell<usize>) -> usize {
let old = x.get();
x.set(old-1);
await!(yield()); // because why not
// Now `x` might be dangling
old
}
// In the destructor
let old = await!(fetch_sub_yield(&self.inner().strong));
if old == 1 { return; }
If yield()
goes back to the event loop and we only get rescheduled later, some other async task in the same thread may already deallocate the inner
, while we still hold a reference (x
) to it -- and we have UB because of the dereferencable
attribute. Granted, this code is slightly more artifical than the real one in Arc
, but not so much that I am certain this will never come up in practice.
I think that if we tell people that concurrent mutation is okay, they can reasonably assume that this includes deallocation.
@briansmith
Take the case where there are no calls to write_volatile or read_volatile. Then what can the Rust
I assume you meant to say "assume" or so? If it isn't for dereferencable
, the compiler cannot assume anything about pointers that are not used -- this goes for volatile and non-volatile accesses alike. And I already agreed that dereferencable
is a problem.
When I have a volatile object then I want to enforce all accesses to the object to have volatile semantics and to prevent any non-volatile accesses.
That is a high-level programming concept: You want to enforce an invariant. Well, use the usual tools -- an abstraction sealing direct access to the location and enforcing that all accesses be volatile. You don't need any language support for that. It's like saying "I want to enforce all accesses to the object use atomic semantics", and we have AtomucUsize
and friends for that and they are just written as a library. The only primitive they need is atomic accesses, and the only primitive you need for your data structure is volatile accesses.
In describing the effect, the operational behavior, of volatile, accesses are all there is. LLVM demonstrates that this is true, because it doesn't even have a notion of volatile objects, so clearly you don't need it.
It seems to me it would be easier to verify that LLVM is doing the correct thing if it too internally switched to C-like semantics where volatile is a property of the object and the pointers to the object.
I do not see any benefit at all from adding this extra notion. It makes things more complicated, what do we gain? Nothing. We still have to handle the case where pointers are casted between volatile and non-volatile, which is legal in C. If you come from volatile objects, this seems strange, but it really isn't -- and with volatile accesses, it's completely natural.
The situation is very much like it is with atomic accesses: For the programmer, it is convenient to have a type that makes sure that all accesses to some memory location are atomic, just to make sure they don't screw up. But for the language, it is entirely unnecessary and just a burden to have a notion of an "atomic object" or so. None of the atomic memory models does that.
The reasons why volatile seems more confusing is (a) you don't usually want "mixed-access" locations that are sometimes volatile and sometimes not (though the Linux kernel has lots of that too, but AFAIK that's using "GCC C" to implement atomic accesses), and (b) the C standard does a really bad job of describing what effect the volatile modified has on an access. I mean, the C standard starts out saying "this specifies an abstract machine, optimizations are not a concern here but a consequence of the behavior of the abstract machine", but then for volatile it goes on not doing that and instead says things about not duplicating memory accesses and the like. That's a bad spec. A proper spec, in "abstract machine" style, is something like what I wrote above: Volatile accesses are like syscalls, they are externally visible behavior with generally unknown side-effects. They are more restricted than syscalls in the effects they can have, but that's about it. Now, not duplicating volatile accesses is a consequence of the spec, the way it should be. Not inserting new accesses is, too. And the entirely unnecessary notion of a volatile object can be removed from the abstract machine. (It is still useful in the surface language! But that's not the topic of discussion here, I don't think.)
@RalfJung
It is my interpretation that the behavior you want was always the intention of the spec, just so far they did not manage to write down what they actually meant. Also, there are people writing MMIO in C -- AFAIK the Linux kernel is full of it -- so in practice, at least for GCC, this seems to work. Or not?
Not only that they didn't write it down, but it wasn't implemented-- I've spoken with multiple people at Google who are telling me they wrote existing LLVM and GCC passes that not only don't uphold this guarantee, but can't uphold it, since there's no mechanism for preventing non-volatile loads from being inserted.
In practice, there are certainly people using this, so it's definitely the case that these extra loads aren't being added often enough that people have noticed, figured out the root problem, reported it, etc. but that certainly doesn't mean it can't/won't happen, or that we can provide Rust users with a guarantee that it won't happen in their programs.
One of my coworkers stumbled on a fun paper, "Volatiles Are Miscompiled, and What to Do about It". It's quite relevant. Some choice sections:
In Section 6.7.3 the C99 standard [7] says: An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine, . . . . Furthermore, at every sequence point the value last stored in the object shall agree with that prescribed by the abstract machine, except as modified by the unknown factors mentioned previously. What constitutes an access to an object that has volatile-qualified type is implementation-defined... This part of the standard admits the possibility of a conforming C implementation that entirely ignores the volatile qualifier (provided that this is documented). However, such an implementation would be utterly useless for systems programming and we will not consider this possibility further. Based on a USENET post from one of the C standards committee members [5], it seems that the offending sentence is a poorly worded reflection of the fact that some hardware platforms have a minimum access width. For example, on a machine that supports only 32-bit memory operations, an access to a volatile byte will unavoidably access the other three bytes in the same word, even if one or more of these is also volatile. Modern computer architectures generally support memory operations at byte granularity.
I've spoken with multiple people at Google who are telling me they wrote existing LLVM and GCC passes that not only don't uphold this guarantee, but can't uphold it, since there's no mechanism for preventing non-volatile loads from being inserted.
That's really surprising to me. If all of these passes treat volatile accesses as opaque operations they don't know anything about -- which is something they have to support as it can happen any time in actual code, by calling an extern function -- then they should uphold the guarantee. So I am puzzled how it is even possible that the fundamental design of an optimization makes it impossible to uphold this guarantee.
They don't have to specifically avoid inserting non-volatile accesses to locations only used in a volatile way. They have to (a) ignore volatile accesses entirely, and (b) avoid inserting accesses to locations not used at all. The former seems trivial, and the latter is required for soundness even if you entirely ignore volatile. So, if you can get one of your coworkers to expand on that a bit, I'd be very interested what it is that I am missing here. (I don't have a deep understanding of how modern compilers work, so I probably am wrong, but I'd like to know how wrong.^^)
That's really surprising to me. If all of these passes treat volatile accesses as opaque operations they don't know anything about -- which is something they have to support as it can happen any time in actual code -- then they should uphold the guarantee. So I am puzzled how it is even possible that the fundamental design of an optimization makes it impossible to uphold this guarantee.
With the current design of LLVM, it isn't valid to insert any non-volatile access unless/until you've proved that a non-volatile access would otherwise occur. But, proving that a non-volatile access would occur is hard and people write passes that insert accesses without such proofs. I imagine most of the broken passes are probably not doing anything with volatile at all; probably the authors didn't realize that they have to assume that all memory is volatile by default.
@briansmith It is not valid to insert any access unless you've proven that the access wouldn't have UB (at minimum; there are other issues of course). That requires lots of leg work but none of it is volatile-specific. You have to rule out dangling pointers, avoid introducing data races, take care about other intervening accesses and calls to deallocation functions, etc. or else your pass is simply wrong even about programs that involve no MMIO or volatile
anywhere. Thus transformations like hoisting loads out of loops (choosing this example because I went through the part of LICM that does this only a few days ago) are gated by lengthy legality checks featuring both elaborate one-off analysis and calls to dozens of very common queries.
As this sort of reasoning is present in any case, it should be quite feasible to not treat an existing volatile access as permission to insert a non-volatile access. This would, for example, happen in places like isSafeToLoadUnconditionally
. I would like to hear from @cramertj's coworkers about which passes are difficult to fix.
@rkruppe
I would like to hear from @cramertj's coworkers about which passes are difficult to fix.
Two of the specific passes mentioned to me were structure splitting and ASAN-- I'll try and get more detailed information.
structure splitting
Hm... this makes me wonder what we guarantee/know about volatile accesses at non-Scalar
types. Now it becomes relevant how the big access is lowered into actual instructions -- bytewise? Word-per-word? What is even the expected behavior here?
What is even the expected behavior here?
The LangRef (Volatile memory accesses) says:
Platforms may rely on volatile loads and stores of natively supported data width to be executed as single instruction. For example, in C this holds for an l-value of volatile primitive type with native hardware support, but not necessarily for aggregate types. The frontend upholds these expectations, which are intentionally unspecified in the IR. The rules above ensure that IR transformations do not violate the frontend’s contract with the language.
Performing a volatile read on a pointer to an aggregate that's larger than the "natively supported data width" does not appear to provide anything that one can rely on. Expecting the whole object to be loaded "atomically" is definitely wrong.
@RalfJung having read the "deprecating volatile" paper from JF Bastien, it seems that "each byte will be read exactly once, and each byte will be written exactly once" is the expected behavior from C++ devs.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1152r0.html
That doesn't seem good enough though, we likely also have to preserve the width of the access -- MMIO will make a different between a 4-byte "register" (mapped at some memory address) being written to at once, or in 4 individual 1-byte writes.
@ubsan Could you point out which parts of the paper leads you to say that? The options laid out in §3.6.3. that leave volatile aggregates around are noted to "clearly tell the compiler that every data member access should be done in a volatile manner". I would read that as supporting primitive-at-a-time (or at least, accesses as large as would otherwise be used for that primitive) rather than byte-at-a-time.
@rkruppe
Importantly, volatile does not guarantee that memory operations won’t tear, meaning that a volatile load may observe partial writes and volatile stores may be observed in parts. Realistically, compilers will only tear when the hardware doesn’t have an instruction which can perform the entire memory operation atomically. That being said, the Standard technically allows an implementation which touched each target byte exactly once, one after the other, in an unspecified order that could change on each execution.
Well that's rather useless in practice, isn't it?
@RalfJung IIUC that means that volatile reads and writes of single bytes are guaranteed by the standard to always work properly.
Whether one can rely on volatile read/writes for 16/32/64/128 bit words not to tear would then be implementation defined. We could provide, for example, the largest size for which volatile pointer reads/writes do not tear via a cfg
attribute, and guarantee that read/writes to aggregates that fit this size do not tear.
I don't know how useful tearing volatile read/writes are (to me they appear to be useless), but we could diagnose them.
This issue (or at least a very similar one) is also tracked inthe rust-memory-model project as https://github.com/nikomatsakis/rust-memory-model/issues/16.
When investigating the implementation of a replace_with
function, I found a case where it is necessary to specialize the implementation of a function depending on whether the value is volatile or non-volatile; see https://github.com/rust-lang/rfcs/pull/1736#issuecomment-437641752. I think similar considerations would apply to any safe abstraction built on unsafe reads/writes through pointers. I suspect if we put more effort into it, we'll find more cases where it is important for volatile
(and atomic) to be part of the type.
Brian,
Are there similar thoughts for unaligned accesses as well? Whether via something like MSVC’s unaligned or from a packed struct representation.
There seem to be unaligned operations on std::ptr for a given access like with volatile.
On Sat, Nov 10, 2018 at 20:01 Brian Smith notifications@github.com wrote:
This issue (or at least a very similar one) is also tracked inthe rust-memory-model project as nikomatsakis/rust-memory-model#16 https://github.com/nikomatsakis/rust-memory-model/issues/16.
When investigating the implementation of a replace_with function, I found a case where it is necessary to specialize the implementation of a function depending on whether the value is volatile or non-volatile; see rust-lang/rfcs#1736 (comment) https://github.com/rust-lang/rfcs/pull/1736#issuecomment-437641752. I think similar considerations would apply to any safe abstraction built on unsafe reads/writes through pointers. I suspect if we put more effort into it, we'll find more cases where it is important for volatile (and atomic) to be part of the type.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rust-rfcs/unsafe-code-guidelines/issues/33#issuecomment-437642000, or mute the thread https://github.com/notifications/unsubscribe-auth/AAgbcRIH_tK-NvI0DSoCc9FBW5RTQC96ks5ut6EPgaJpZM4XUNoB .
I found a case where it is necessary to specialize the implementation of a function depending on whether the value is volatile or non-volatile
Could you elaborate on that? I think in this case there is no need to have volatile and/or non-atomic variants, as explained in https://github.com/rust-lang/rfcs/pull/1736#issuecomment-437651984. Any use of &mut
with volatile data is incorrect anyway, it does not become less incorrect by having a replace_with_volatile
. &mut
pointers must also always be aligned, and hence there is no need for an unaligned version.
Also, why would replace_with
be different than all the other &mut
methods (swap
, replace
and so on)? Your concerns apply to raw pointers, but replace_with
is about providing a safe API working on &mut
. There obviously cannot be a safe API with raw pointers.
See this reddit thread for a relevant discussion between @comex and me. I think we reached a conclusion that we can both live with, but I hope @comex can respond there or here to confirm. :)
I had first proposed half-of-a-spec for volatile
earlier in this thread. The way this proposal can be formalized is by having a notion of "externally observable events" in the language spec. This notion is needed anyway to even define what it means for a compiler to be correct: A correct compiler will output a program that, when run, produces sequences of observable events such that every such sequence is also a possible sequence when running the original Rust program on the Rust Abstract Machine (or the original program causes UB when interacting with these events, then there are no requirements). "Events" here can carry data both ways; think of them like syscalls: write
carries data from the program to the outside world, and read
carries data in the other direction. We can then say that a volatile memory access is such an observable event (where again a write sends data to the outside and a read receives data from the outside).
Because the compiler has to preserve the sequence of observable events, it is not allowed to e.g. reorder to volatile memory accesses.
The new things I now realized are:
What my definition does not answer is what exactly actually happens when you run the code. Sometimes I am so focused on the Abstract Machine that I forget about that aspect. ;) And the clever bit about saying that this is an "externally observable event" is that we actually can "delay" giving that answer until after the translation to machine code -- and then we can say "well it's hardware-level memory accesses".
All we care about in the Abstract Machine / the compiler is that every sequence of externally visible events generated by the compiled program is also a possible sequence of externally visible events that the Abstract Machine could generate (or the Abstract Machine declared UB). Of course these externally observable events end up communicating with something (the kernel, for syscalls; the host architecture memory system, for
volatile
), and so now it is up to that "something" to define what the result is of executing this particular sequence of events.
Basically, since events can also feed data into the program, they can affect how the program goes on behaving after they received that input. The Rust language spec does not care; it does not require the kernel to do any particular thing when calling the read
syscall, it just promises that whatever the compiled Rust program does (in terms of how the remainder of its sequence of observable events looks like) for the concrete thing that read
returned, the same thing could happen when running the Rust Abstract Machine and considering read
to behave the same way. Volatile works just the same; the Abstract Machine does not have to talk about MMIO or target platform details, but we can later "wire up" those events the right way.
Hm, somehow this is harder to explain than I thought, and I feel the above got rather confusing... :/
(I edited the prior post to expand on what I mean, but it got confusing, so now I am trying an example.)
Consider the following Rust program:
fn main() {
let mut input = String::new();
io::stdin().read_line(&mut input).unwrap();
let n: i32 = input.parse().unwrap();
if n == 7 {
unsafe { *(0 as *mut i32) = 0; } // cause UB
} else {
println!("{}", n);
}
}
Let us say the events of our language are given by
enum Event { Read(String), Write(String), Panic }
The behavior of writing to stdout is to cause a Write
event with the text written. The behavior of reading from stdin is very counter-intuitive, but please bear with me: the behavior is specified as picking any input non-deterministically, causing a Read
event with that input, and then returning that input.
(To make everything work out formally, we also have to make a progress requirement: if a program can cause Read(s)
as the next event for some s
, it can also cause Read(s2)
as the next event for any s2
. This basically means that if a program can read any value, it has to also be ready to read any other value. The semantics cannot just "ignore" some values. Our definition for reading from stdin satisfies this, and the compiler has to preserve it. This is basically just a fancy way of saying that the compiler may not make any assumptions about how the input is "picked".)
With this definition, the possible sequences ("traces") of events of our program depend on what got picked for the Read
event. The events roughly are:
Read(s)
where s
does not contain a line ending, followed by Panic
Read(s)
where the first line of s
is not a 32-bit integer, followed by Panic
Read(s)
where the first line is an integer i
not equal to 7, followed by Write("{i}\n")
Read(s)
where the first line is 7
, followed by absolutely anything because there is UBNotice how Read
behaves like input to the program, and the structure of all sequences that the program can produce is like a tree, where input events such as Read
are branching points.
The compiler can compile our program in any way as long as the final program's behaviors are included in the behaviors described above. For example, it could decide that the "then" branch of our conditional is dead code, and always just run println!("{}", n)
. Then the compiled program would have the following possible sequences of events:
Read(s)
where s
does not contain a line ending, followed by Panic
Read(s)
where the first line of s
is not a 32-bit integer, followed by Panic
Read(s)
where the first line is an integer i
, followed by Write("{i}\n")
This is included in the above because Write("7\n")
is a special case of "absolutely anything", so this compilation is correct. But the compiler could also have done other things for the case where the first event is Read("7\n")
, such as causing a segfault or calling some random function from the libc. We really cannot know.
Now consider the case where we are working in an environment where we know the input will never be 7
on the first line. This corresponds to making an assumption about the events that the outside world will "send" to our program. It means we are only interested in sequences where the first event is not a Read
with the first line being 7
. If we just consider those sequences, it just does not matter what the compiler did for the UB branch as long as it got all the other cases right, which it has to if it wants to be a correct compiler. For the cases that interest us, we know the program will either panic or print what it read.
We just argued about a program that does I/O, without saying anything about how I/O works!
We can do exactly the same for volatile: we would add VolatileRead(u32)
and VolatileWrite(u32)
events (really lots of events for the different sizes, I guess), and we would define the effects of volatile accesses in the Abstract Machine as generating such events, and then after compiling the program we can restrict our attention to those sequences of events that can actually happen based on hardware semantics.
This entire approach is very closely related to communicating sequential processes, where we see our program as one process and other things like the kernel or MMIO devices as other processes, and because the compiler has to preserve the way that the program reacts to external events, we can basically imagine the Rust Abstract Machine to be "plugged together" with the kernel or the MMIO devices. The interface for these "connections" is given by the possible events, or messages being exchanged both ways.
This material can fill entire books, so I am by far not doing it justice here. I hope this still helps to shed some light on the way I am thinking about volatile. At some point I should turn this into a blog post, or more likely a series of posts, but I have to finish a PhD thesis first...
While bickering stopped, I'd like to ask if I can fix broken reference to memory mapped object by somehow forcing compiler to do all read/writes as volatile using asm and fence like:
asm!("" ::: "memory" : "volatile")
core::sync::atomic::compiler_fence(Ordering::SeqCst);
Will llvm be allowed to ignore it or will it see asm and use only volatile read/writes after this instruction?
short answer: no, not with how you've described it
you would need to create an abstraction around whatever you're using to only ever use volatile reads/writes, see something like voladdress for the current most sound solution to the problem (AKA never use references, only ever pointers)
Closing in favor of https://github.com/rust-lang/unsafe-code-guidelines/issues/411. The original questions have been answered (using this pattern with raw pointers is fine, but with references it is not currently fine).
Folks who want to write drivers and embedded code using Rust need to have a way to guarantee exactly-once access to certain memory locations. Today, the embedded wg makes extensive use of @japaric's
VolatileCell
crate, along with RegisterBlock structures containingVolatileCell
wrappers around each field of the register block, and a function to provide a single access to the register block at a fixed address. The API exposed in the thestdm32f103xx
crate and similar only expose*const RegisterBlock
values (example) from the overallPeripherals
object. This then requires unsafe code to access and mutate any particular field.Asks:
unsafe { (*x.volatile_cell_field).set(...) }
, and that the number of reads will exactly match the number of calls tounsafe { (*x.volatile_cell_field).get(...) }
? it seems like it should be.&
? It would be possible to provide a customRegisterRef<'a, T>
that consisted of a raw pointer internally as well as a custom derive for projecting this to fields of the register block, but this seems unfortunately complicated and unergonomic.Complicating factors:
VolatileCell
is special, similar toUnsafeCell
, and cannot have "dereferenceable" applied to references to it (and objects that contain it), in order to prevent this misoptimization? This seems potentially more complicated and intrusive, but IMO still worth considering.cc @RalfJung @kulakowski @teisenbe @rkruppe