ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
35.19k stars 2.57k forks source link

remove `volatile` pointer attribute and keyword. replace with @volatileLoad and @volatileStore #4284

Closed andrewrk closed 3 months ago

andrewrk commented 4 years ago

Realistically, the only valid use case for volatile is Memory Mapped Input/Output. That is, when memory loads and stores have side effects. Normally, memory loads and stores do not have side effects.

There are some edge cases to solve with volatile, as well as this talk of J.F. Bastien about deprecating volatile in C++.

This proposal is to remove the pointer attribute and keyword, in favor of two simple builtins:

This would be equivalent to @atomicLoad and @atomicStore except with volatile semantics rather than atomic semantics.

The upsides of this would be a simpler language, and clear rules about what volatile loads and stores mean. For example, @volatileLoad or @volatileStore would be a compile error if it would tear. However, when it is a pointer attribute, zig has to figure out what is supposed to happen with a tear.

The downsides of this is that a volatile pointer to a packed struct is less useful. In practice it could be OK to have a packed struct which maps to MMIO structs in memory, where loads/stores tear. But I believe in practice this is rare, and the builtins would suffice. However, there is C code that uses packed structs to represent MMIO, and porting it to zig would require using less convenient @intToPtr + @volatileLoad/@volatileStore syntax.

daurnimator commented 4 years ago

I've been playing around with using zig to write things for the fomu and I ended up with a sort of annoying pattern for the MMIO:

/// https://rm.fomu.im/touch.html
pub const TOUCH = struct {
    const base = 0xe0005800;

    /// Output values for pads 1-4
    pub const TOUCH_O = @intToPtr(*volatile u4, base + 0x0);

    /// Output enable control for pads 1-4
    pub const TOUCH_OE = @intToPtr(*volatile u4, base + 0x4);

    /// Input value for pads 1-4
    pub const TOUCH_1 = @intToPtr(*volatile u4, base + 0x8);
};

I attempted to make it just one extern struct and use align(4) for layout, but that didn't seem to work. Volatile was on the struct pointer itself and not at the field level.

Perhaps volatile should not be a pointer attribute, but a field/variable attribute?

ikskuh commented 4 years ago

I really like the idea, but i shiver at the same time because of the overhead that happens when i would like to write embedded code.

A short example (ignoring headers/imports) for AVR that let's an LED blink:

const TIMER1_FREQ = 2; // Hz
const TIMER1_PRESC = 1024;
const TIMER1_LIMIT = (((F_CPU) / (TIMER1_PRESC)) / (TIMER1_FREQ) - 1);

pub fn main() int {
    @volatileStore(DDRB, (1 << PIN5));

    @volatileStore(TCCR1A, 0);
    @volatileStore(TCCR1B, (1 << CS12) | (1 << CS10) | (1 << WGM12));
    @volatileStore(TCCR1C, 0);

    @volatileStore(OCR1AH, @truncate(u8, (TIMER1_LIMIT >> 8) & 0xFF));
    @volatileStore(OCR1AL, @truncate(u8, (TIMER1_LIMIT & 0xFF)));

    @volatileStore(TIMSK1, (1 << OCIE1A));

    sei();

    while (true) {}
}

pub fn TIMER1_COMPA_vect() callconv(.Naked) void { // hoverever this will look in actual Zig...
    @volatileStore(PORTB, @volatileLoad(PORTB) ^ (1 << PIN5)); // this has to be optimized to a singular instruction by the optimizer later…
}

The variant with @volatileStore and @volatileLoad is 40% larger (648 byte) compared to the variant with DDRB = (1 << PIN5) (453 byte).

For most applications i really like the idea of having volatile being made explicit, but it also makes zig less attractive to embedded developers as they have to write a lot more code (another example here).

After writing the second example, i really think we both win and lose "Communicate intent precisely." as i find the @volatileLoad-Version much harder to read and grasp what the code actually does (which is: setting/clearing some bits) in a memory location

justinbalexander commented 4 years ago

Can you add a link to the example that is a smaller binary?

On Sun, Jan 26, 2020, 6:27 AM Felix Queißner notifications@github.com wrote:

I really like the idea, but i shiver at the same time because of the overhead that happens when i would like to write embedded code.

A short example (ignoring headers/imports) for AVR that let's an LED blink:

const TIMER1_FREQ = 2; // Hz const TIMER1_PRESC = 1024; const TIMER1_LIMIT = (((F_CPU) / (TIMER1_PRESC)) / (TIMER1_FREQ) - 1);

pub fn main() int {

@volatileStore(DDRB, (1 << PIN5));

@volatileStore(TCCR1A, 0);

@volatileStore(TCCR1B, (1 << CS12) | (1 << CS10) | (1 << WGM12));

@volatileStore(TCCR1C, 0);

@volatileStore(OCR1AH, @truncate(u8, (TIMER1_LIMIT >> 8) & 0xFF));

@volatileStore(OCR1AL, @truncate(u8, (TIMER1_LIMIT & 0xFF)));

@volatileStore(TIMSK1, (1 << OCIE1A));

sei();

while (true) {}

}

pub fn TIMER1_COMPA_vect() callconv(.Naked) void { // hoverever this will look in actual Zig...

@volatileStore(PORTB, @volatileLoad(PORTB) ^ (1 << PIN5)); // this has to be optimized to a singular instruction by the optimizer later…

}

The variant with @volatileStore and @volatileLoad is 40% larger (648 byte) compared to the variant with DDRB = (1 << PIN5) (453 byte).

For most applications i really like the idea of having volatile being made explicit, but it also makes zig less attractive to embedded developers as they have to write a lot more code (another example here https://gist.github.com/MasterQ32/d17debefd159821b1e9debe0e869eef0).

After writing the second example https://gist.github.com/MasterQ32/d17debefd159821b1e9debe0e869eef0, i really think we both win and lose "Communicate intent precisely." as i find the @volatileLoad-Version much harder to read and grasp what the code actually does (which is: setting/clearing some bits) in a memory location

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ziglang/zig/issues/4284?email_source=notifications&email_token=AJOF5C2HI5HTUW2Y7TBHJHTQ7V6U5A5CNFSM4KLPPD32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ5SWBI#issuecomment-578497285, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJOF5C3KBRZ4SN5OTJ2DIITQ7V6U5ANCNFSM4KLPPD3Q .

kyle-github commented 4 years ago

Somewhat on a tangent...

Is volatile necessary? If you have multiple threads of control, you probably need atomic accesses. If you do not, you would need something similar to volatile. Zig already behaves differently when you indicate to the compiler than you are going to be single threaded. Perhaps volatile is just what results when you use atomic accesses in a single threaded program?

This definitely has some overhead, but simplifies the mental model.

As to the problem of using @volatileStore and @volatileLoad perhaps special forms of = would help? There are already special + such as %+ etc. for different uses. What about @= or something (please do not actually use that one!) to indicate to the compiler, "really do that load/store no matter what optimizations seem to apply."

One thing I like very much with @volatileStore and @volatileLoad is that intent is communicated very precisely exactly where volatile behavior is needed. Having volatile as an attribute on a variable or type means that you do not actually know what happens when you see an assignment somewhere elsewhere in the code. Spooky action at a distance.

The amount of typing seems like a fairly weak argument. You read code and need to understand it a lot more than you type it. This is a problem in C where you can hide the volatile in a header or typedef etc. and have to look up the actual definition site of each variable to see what happens with a simple assignment.

This really comes down to what you are trying to show: Is a variable volatile or are assignments to/from that variable volatile? C does this on the variable. The new @volatileStore and @volatileLoad do this at the site of the assignment.

Now back to lurking...

ikskuh commented 4 years ago

Is volatile necessary?

Yes, absolutely. It's not meant for thread synchronization / atomics, but it's for MMIO which enforces reads/writes to happen without reordering.

Is a variable volatile or are assignments to/from that variable volatile?

That's a good question. I'd answer this with "volatile is the property of a memory location".

It would always be an error to access a MMIO location without volatile and accessing a non-MMIO location with volatile (instead of an atomic).

So @volatileLoad and @volatileStore would open up the possibility to access volatile storage without volatile (by just not using it) and thus would result in a programming error.

kyle-github commented 4 years ago

Is volatile necessary?

Yes, absolutely. It's not meant for thread synchronization / atomics, but it's for MMIO which enforces reads/writes to happen without reordering.

I think you misunderstood the half-baked idea. The core of the idea was that atomics are volatile accesses plus memory barriers (sort of). So in the case of a single-threaded program, you do not need the memory barriers, so atomic accesses degenerate into volatile accesses. Given that, do you actually need a separate volatile access method or can you use atomic instead and take the hit on performance (which you might anyway) if you do have a multi-threaded program and actually only needed volatile semantics? I was aiming for a simpler mental model.

Is a variable volatile or are assignments to/from that variable volatile?

That's a good question. I'd answer this with "volatile is the property of a memory location".

This does seem to be the core question. That needs to be answered and accepted before either of the current alternatives can be usefully discussed. Otherwise this is really two solutions for two different problems :-(

So @volatileLoad and @volatileStore would open up the possibility to access volatile storage without volatile (by just not using it) and thus would result in a programming error.

That is a good point. Whatever the solution I would argue that just having volatile as an attribute of a variable or type creates code behavior changes elsewhere and thus violates the Be Explicit commandment. As you note using @volatileLoad/Store does not strongly encourage safety. Requiring both attributes for volatile and @volatileLoad/Store would be safer, but does not have great UX. Hmm...

justinbalexander commented 4 years ago

At work, we have recently started a new project using a cortex-a5 and FreeRTOS.

The IP stack that we are evaluating (pretty expensive) was causing data aborts when the optimizations were turned up. The issue is that the DMA buffers used to pull data from the Ethernet MAC have to be placed in strongly-ordered or device memory sections which are setup when programming the stage 1 translation tables of the MMU. The thing about strongly ordered and device memory regions is that unaligned accesses cause an alignment fault, regardless of whether unaligned access is enabled for normal memory. We are using the latest Linaro arm-none-eabi-gcc toolchain for cortex-a processors from the arm website and there is not a very good way to prevent the compiler from emitting instructions that will cause an alignment fault, even when packing the bytes in and out manually for data types that are wider than 1 byte.

EDIT: my link insertion was broken. See this godbolt link for a (stripped) example: https://godbolt.org/z/GFKo5w

Note the problem exists even for the architectures native data type.

Note also that -ffreestanding is needed to signify that builtins are not to be used and thus compile-time known memcpy, memset, etc. don't get inlined and thus insert unaligned accesses where you don't expect them. They are forced to go through function calls which actually handle alignment issues. (These examples don't use memcpy, but it illustrates that the problem further)

There are two compiler windows open so you can see that using the "-mno-unaligned-access" option during compilation would prevent the compiler from smushing, for lack of a better term, the reads together into a single unaligned load because the alignment of the pointee type is 1 and so it can't tell if the accesses are truly aligned or not.

The option can only be given on the command line, however, and applies to the entire compilation unit so it must be known, outside of reading the code, what the compilation environment is expected to be. You could of course have a file containing just these functions for which unaligned loads are verboten and compile them separately from the rest of your application code, but that also requires special knowledge that is not apparent in the code unless it is in a comment somewhere. Hope you see it before you ship. Having the function accept only volatile pointers keeps all the information write there in the source code, which I consider a win.

I hereby assert, with this real world use case in hand, that the volatile keyword does have a valid meaning and use case outstide of the colloquial definition of mmio, usually meant to be peripheral registers. The volatile keyword forces the compiler to actually schedule a write or a store for each memory access through a volatile pointer because it can't assume there are no side effects for each memory access. As said by others before, this has nothing to do with atomics or the order in which the writes arrive at the destination, but the use case still exists and is valid.

You could argue that strongly ordered memory is a sort of mmio. Accessing it through the builtins only would make programming errors more likely.

In Zig, if slices could be declared as volatile, I would make the DMA buffers be volatile and thus only the correct functions that appropriately handle alignment issues would even be allowed to be called.

frmdstryr commented 4 years ago

After using zig for months now, I still have no idea what "has no side effects" from llvm's docs is supposed to mean. This description from @vegecode of what volatile currently does makes more sense:

The volatile keyword forces the compiler to actually schedule a write or a store for each memory access through a volatile pointer

The cpu can modify certain registers/memory locations whenever it wants, if compiler doesn't know this it will try to be smart and skip or reorder steps, which is sometimes unwanted.

Since people seem to always confuse the c volatile with java's volatile (myself included initially and from sitting on IRC). Maybe it'd be better to just change the keyword to something more clear, that means don't optimize this out.

frmdstryr commented 4 years ago

Also, I ported a small portion of the stm32h743 hal to pure zig (uart, timers, gpio, interrupts, and a tft driver). It's roughly 6k loc.

Pretty much every register read or write is done using a small utils.zig with

// ----------------------------------------------------------------------------
// Register utilities
// ----------------------------------------------------------------------------
pub inline fn setBit(reg: *volatile u32, bit: u32) void {
    reg.* |= bit;
}

pub inline fn toggleBit(reg: *volatile u32, bit: u32) void {
    reg.* ^= bit;
}

pub inline fn clearBit(reg: *volatile u32, bit: u32) void {
    reg.* &= ~bit;
}

pub inline fn readBit(reg: *volatile u32, bit: u32) u32 {
    return readReg(reg) & bit;
}

pub inline fn isBitSet(reg: *volatile u32, bit: u32) bool {
    return (readReg(reg) & bit) == bit;
}

pub inline fn clearReg(reg: *volatile u32) void {
    reg.* = 0x0;
}

pub inline fn writeReg(reg: *volatile u32, val: u32) void {
    reg.* = val;
}

pub inline fn readReg(reg: *volatile u32) u32 {
    return reg.*;
}

pub inline fn modifyReg(reg: *volatile u32, clear_mask: u32, set_mask: u32) void {
    return writeReg(reg, (readReg(reg) & ~clear_mask) | set_mask);
}

pub inline fn writeType(comptime T: type, reg: *volatile T, val: T) void {
    reg.* = val;
}

pub inline fn readType(comptime T: type, reg: *volatile T) T {
    return reg.*;
}

and work on pointers translated from the chip's headers eg

pub const GPIO_TypeDef = struct {
    MODER: u32,
    OTYPER: u32,
    OSPEEDR: u32,
    PUPDR: u32,
    IDR: u32,
    ODR: u32,
    BSRR: u32,
    LCKR: u32,
    AFR: [2]u32,
};
pub const GPIOA_BASE = D3_AHB1PERIPH_BASE + @as(u32, 0x0);
pub const GPIOA = @intToPtr(*volatile GPIO_TypeDef, GPIOA_BASE);

// Used like
pub inline fn set(self: *Pin) void {
    util.writeReg(&self.instance.BSRR, @enumToInt(self.pin));
}

For toggleBit, setBit and clearBit the compiler already spits out a single optimized instruction for these in release mode, I'm not sure how those would map to a @volatileStore without first using a @volatileLoad.

Also, currently the compiler uses volatile to enforce that what I'm passing in was from a struct pointer annotated with volatile which has helped fix a few errors during development. Would these new builtins also be able to do that?

frmdstryr commented 4 years ago

Well after watching the ccpcon video I now feel ignorant :smile: . There's much more to volatile than what I know about.

It'd be good to separate intent when and where possible. So I'm :+1: on this change (assuming it can handle more fine grained updates).

frmdstryr commented 4 years ago

In addition to @volatileLoad and @volatileStore, there also needs to be a way to tell the compiler to not optimize out "unused" variables that are needed for debugging in release safe mode.

Currently I have to fight with the optimizer by doing stupid usages to prevent it from removing important things like the stack trace (which is unused until I halt and need to look at it).

For example the err, stack, and bt in the fn below all get optimized out in release mode making it hard to debug. None of these are "volatile" variables in the MMIO sense, but I should still be able to somehow mark that I want to always keep them.

pub fn panic(err: []const u8, stack: ?*std.builtin.StackTrace) noreturn {
    var bt: *[]usize = if (stack) |st| &st.instruction_addresses else undefined;
    while (true) {
        LED3.toggle();
        var i: u32 = 0;
        while (i < 1000000) : (i += 1) {
            asm volatile ("nop");
        }
    }
}
frmdstryr commented 4 years ago

Maybe it would make sense to just replace volatile with different keywords that indicate explicitly the use case like mmio, alignedio, and nooptmize?

ikskuh commented 4 years ago

After some discussion in Discord with GingerBill, Tetralux, Oskar and others, there were some insights:

Possible solutions for Zig:

Have volatile as a pointer modifier

Status quo.

const ptr = @ptrCast(*volatile u32, 0xF00BA7);
ptr.* = 0xFF00FF00;

Pro:

Con:

Comment: There should be no implicit casting between volatile and non-volatile pointers in any case. All need @ptrCast.

Have volatile similar to C as part of the type

var value : volatile u32 = undefined;
value = 0xFF00FF00;

Pro:

Con:

Replace volatile with @volatileLoad and @volatileStore

const ptr = @ptrCast(*u32, 0xF00BA7);
@volatileStore(ptr, 0xFF00FF00);

Pro:

Con:

In addition to volatile pointers, enforce @volatileLoad and @volatileStore

const ptr = @ptrCast(*volatile u32, 0xF00BA7);
@volatileStore(ptr, 0xFF00FF00);

Pro:

Con:

Drop volatile as a language construct

volatile as a pointer qualifier could be removed compeltly from the language and rely on asm volatile for the same kind of work. This would require a type to implemented in user space that implements a simple load/store for every supported platform:

function Volatile(comptime T: type) type {
    return extern struct {
        const Self = @This();

        backing_memory: T,

        fn load(self: *Self) T {
            return asm volatile("…");
        }
        fn store(self: *Self, value: T) void {
            return asm volatile("…", value);
        }
    };
}

const ptr = @ptrCast(*Volatile(u32), 0xF00BA7);
ptr.store(0xFF00FF00);

Pro:

Con:

Conclusion

Now that i think of it, i prever one of the last two solutions. The user space implementation would make the language itself simpler by removing a whole feature, but would create some work in the user space. As volatile loads/stores don't have to be atomic in general this could be implemented pretty easily.

Also having both pointer and load/store as an explicit MMIO operation conveys the zig spirit by ignoring the amount of code to write, but preferring the explicitness.

SpexGuy commented 4 years ago

@frmdstryr Regarding preventing the optimizer from removing a variable, volatile or an equivalent nooptimize could be a little heavy-handed for that, and might be easy to accidentally leave in release builds or use for the wrong purpose. The more targeted way to prevent optimization of a certain value is with the technique shown in this Chandler Carruth talk: https://www.youtube.com/watch?v=nXaxk27zwlk&t=2446 It translates pretty directly to Zig:

inline fn escape(ptr: var) void {
    comptime assert(@typeInfo(@TypeOf(ptr)) == .Pointer);
    asm volatile ("" : : [g]""(ptr) : "memory");
}
inline fn clobber() void {
    asm volatile ("" : : : "memory");
}

If you don't want to do targeted de-optimization and just want all variables in a scope to remain available for debugging, @optimizeFor (https://github.com/ziglang/zig/issues/978) will handle that use case.

SpexGuy commented 4 years ago

@MasterQ32 I don't think the last two are mutually exclusive. My preference would be to switch to @volatileLoad/Store but provide a library type for the common case of a volatile register, similar to std.Atomic. Something like this:

// using syntax from #5049 for now but whatever the eventual solution to this is
const CRDR_Reg = packed struct(u32) { cr: u9, dr: u9, bflag: bool, reserved: u13  };
const crdr = std.MMIO_Register(CRDR_Reg).init(CRDR_ADDRESS);
fn updateCRDR(cr: u9, dr: u9) void {
    var value = crdr.read();
    value.cr = cr;
    value.dr = dr;
    crdr.write(value);
}

This is a little more verbose but it prevents people from accidentally doing this:

const crdr = @intToPtr(*volatile CRDR_Reg, CRDR_ADDRESS);
fn updateCRDR(cr: u9, dr: u9) {
    // bad: generates two reads and two writes and is not obvious about it
    crdr.cr = cr;
    crdr.dr = dr;
}

This case that @frmdstryr points out is actually really important:

For toggleBit, setBit and clearBit the compiler already spits out a single optimized instruction for these in release mode, I'm not sure how those would map to a @volatileStore without first using a @volatileLoad.

We would probably also need a @volatileRmw builtin similar to @atomicRmw to handle this case properly, unless the optimizer is allowed to combine volatile loads and stores. (which might already be the case, depending on the exact language semantics of +=)

Edit: add caveat about combining

SpexGuy commented 4 years ago

It looks like the optimizer is allowed to combine a volatile load, modify, volatile store into a single instruction. See this godbolt. So @volatileRmw is not necessary.

nmichaels commented 4 years ago

This might be a terrible idea, but what about volatile blocks? If all loads and stores in a volatile block are treated as side effects, it becomes easy to make volatile types:

function Volatile(comptime T: type) type {
    return struct {
        const Self = @This();

        backing_memory: T,

        fn load(self: *Self) T {
            volatile {
                return self.backing_memory;
            }
        }

        fn store(self: *Self, value: T) void {
            volatile {
                self.backing_memory = value;
            }
        }
    };
}

This gets the advantage of making volatile usage explicit everywhere, while still allowing the compiler to know not to reorder memory access, without requiring a whole bunch of processor-specific code in the standard library. Of course it still costs a language feature.

The big down side that I see is that it allows stuff like this:

extern var foo: u8 = undefined;
...
volatile { foo = 1; }
foo += 1;

It might be possible to detect that kind of thing with the compiler (memory accessed in volatile sections can only be accessed in volatile sections) but it's pretty weird to decouple it from the type like that.

Personally, I think the status quo strikes a decent balance. Maybe the best solution is a mix of this and the status quo. I think all that visual noise in @volatileLoad/@volatileStore makes the language harder to read by at least as much as it makes it harder to write.

ikskuh commented 4 years ago

I really like volatile blocks. They imho have the most advantages (remove visual noise, explicit, flexible), remove the volatile information from the type completly (so no viral types). We should only restrict to what is allowed to happen in a volatile block (for example: no function calls)

frmdstryr commented 4 years ago

One downside of using the inline functions https://github.com/ziglang/zig/issues/4284#issuecomment-588235419 above is that it makes the comments in the disassembly difficult to read in release builds...

image

Using the inline operations directly avoids this but is error prone.

kyrias commented 2 years ago

Have volatile similar to C as part of the type ...

Con:

  • It doesn't make sense in most cases (volatile local/global variables aren't really useful)

They would be very useful for interrupt handlers. Right now you would have to have both a global variable and a volatile pointer to that variable to safely communicate between an interrupt handler and the rest of the code.

lloydjatkinson commented 2 years ago

I just wanted to add my point of view, I’m considering trying out zig for embedded work and volatile is essential in that space.

topolarity commented 2 years ago

Bit out of left field, but I wonder if volatile blocks could also be used for floating point arithmetic:

volatile { // Enforces reproducible results across targets
    var x: f32 = 15.3;
    var y = x * x + x * 1.5 - 0x1.0001p-3; // intermediate results stored in `f32` (not `f80`)
}

In this case, the meaning of volatile is slightly broader than just "does not reorder".

Instead, it enforces the "literal meaning" of the source code, preventing re-ordering, implicit in-register type promotion, and any optimizations that would do anything to change the exact value/representation of the result. Dedicated hardware instructions that perform un-specified estimation, such as fast reciprocal square-root (rcpps) and transcendentals (fsin, fcos, etc.) would also be unavailable.

If it's not possible to reproduce the literal meaning of the source code for a given target, that's a compile error.

16hournaps commented 3 months ago

Sounded like a good idea, what changed ? @andrewrk