ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.53k stars 2.52k forks source link

Change zig int promotion / cast rules #7967

Closed lerno closed 1 year ago

lerno commented 3 years ago

Some background and comparisons can be found in the following blog articles: On arithmetics and overflow, Overflow trapping in practice.

Current rules in Zig using peer resolution + assigned type has issues. One is that occasionally the type of a literal cannot be resolved, but also the peer resolution gives unexpected results. This proposal does affect the mod arithmetics as well, since they become less useful.

The main idea of the proposal is make all integer arithmetics act as if it was done with bigint, but trap / UB if the conversion any truncation occurs.

EDIT I've updated the algorithm quite a bit, although the overall behaviour (as stated above) should be the same. EDIT2 Added blog articles as references.

To explain the algorithm, we need to define the following:

  1. The target type, this is the type that the variable or parameter has that we assign the expression to. It may be empty.
  2. The resulting type, this is the type we are performing calculations using.

In addition to this we define a base arithmetics bit width (BABW), which typically is 32 or 64 which is the platform dependent minimum type that is efficient to perform calculations on and a max arithmetics bit width (MABW) which is the corresponding maximum efficient type.

The following occurs:

  1. For each operand, recursively resolve the type passing down the target type. E.g. analysing i16 b = (a + b) * c we first analyse, a, then b, then a + b and c, then (a + b) * c. In each of these we pass along the i16 type.
  2. If casting expression is reached, the target type is set to empty when recursively processing its sub expression.
  3. For each operand, if it has a fixed type size and it is greater than the target type then this is a compile type error. E.g. i16 s = 1 + @as(i16, 1) would be an error as the right sub expression is of type i32, which is wider than i16.
  4. Looking at the operands, find the smallest power of 2 integer type that can represent the ranges of both operands, that is at least the same as BABW. For instance if BABW = 32, then i16 + i31 => i32 i32 + u32 => i64 i8 + u16 => i32 u8 + u16 => u32. This is the resulting type. If the required width would exceed MABW, then this is a compile time error.
  5. If the width of the resulting type is smaller than the bit width of the target type, change the bit width of the resulting type to be the same as the target type. E.g. u64 = i8 + u16 => resulting type is i64.
  6. Perform the calculation, with overflow being trapped (debug mode) or considered undefined behaviour (release mode).
  7. If the resulting type is of different signedness from the target type, cast the result to the target type. If under/overflow would occur, then this is trapped (debug mode) or considered undefined behaviour (release mode).

Examples:

var x: u8 = 0xFF;
var b: i16 = 0x7FFF;

var z: i32 = x + b + 3;
// Same as today's:
var z: i32 = @as(i32, x) + @as(i32, b) + @as(i32, 3);
// Note that we take advantage of the fact that overflow is UB/Trap

var sz: i16 = x + b + 3;
// Same as today's:
var sz: i16 = @truncate(i16, @as(i32, x) + @as(i32, b) + @as(i32, 3));

var sz2: i16 = @as(i16, x + b + 3);
// Same as today's:
var sz2: i16 = x +% @as(i16, b) +% 3;

Some more complex examples:

var a: i32 = 1;
var b: u8 = 3;
var c: i16 = 4;
var e: u32 = 5;
var f: i64 = 6;

var g: i32 = a + (b * a) / (c * f); // Error, requires explicit cast due to f > i32
var h: i32 = a + (b * c);
// Same as today's
var h: i32 = a + (@as(i32, b) * @as(i32, c));
// Uses overflow UB for + and *

var i: i32 = b + (a * e);
// Same as today's
var i: i32 = @intCast(@as(i64, b) + (@as(i64, a), @as(i64, e)));
// Uses overflow UB for + and *, then truncate to check it is in range.

var j: i32 = e - 10;
// Same as today's
var i: i32 = @intCast(e - @as(u32, 10);
// Note that an extra rule can be inserted to instead make this:
var i: i32 = @intCast(@as(i64, e) - @as(i64, 10));

As an optional extra rule we can say that given an expression with a high type A and a low type B, doing & with a constant of bit size N will reduce the size of the low type to N:

var a: i16 = 12345;
var b: u8 = a & 0xFF;
var c: u1 = a & 0x01;

Note that #7416 also touches on some aspects of this.

lerno commented 3 years ago

WeI want to add that if there is interest in this feature, it's possible to describe the above algorithm in a much simpler manner. But if there is scant interest I won't bother.

To reiterate, the behaviour is basically "behave as if the calculation is done with infinite bit width everywhere then truncate with runtime trap/UB if truncation is not lossless, and issue a compile time error if the target type is smaller than one or more of the original types of the operands"

kyle-github commented 3 years ago

Thanks for the synopsis, @lerno.

It looks like point 9 and your second example are in conflict. The target type in your second example is i32 but you are not showing a trap. Did you mean to have this the other way? That there is no automatic coercion to a smaller type and that should trap?

Doesn't the the overall size depend on the operation? I.e. addition can only add a bit (assuming the operands are the same signedness), but multiplication can add the bit lengths together. For instance, multiplying two i8 can give a result that needs to be in i16 if you want it without loss. Subtraction and division follow similar logic.

I am a little concerned already that there is a lot of automatic coercion going on and that violates the Zen of Zig.

I thought the rules were roughly:

  1. if the operands differ in signedness, promote the unsigned operand to a signed operand one bit larger. I.e. u9 gets promoted to i10. Otherwise they stay the same signedness and size.
  2. The result must match the signedness of the promoted operands from step 1.
  3. If the operation is add, the result must be at least the bit width of the largest operand plus one.
  4. If the operation is multiplication, the result must be at least the bit width of a sum of the bit widths of the converted operands.

Etc.

lerno commented 3 years ago

@kyle-github I think I need to clarify a bit. Basically any arithmetic operation would in itself trap/UB, so essentially we have two places we trap: (1) in the narrowing (2) in the operation itself. This is why there is no need to do the widening you do in 3/4.

I've updated the algorithm above significantly. Hopefully it's more clear now.

matu3ba commented 3 years ago

@lerno Briefly mentioning how c-types (should) behave would make the picture more complete. Looking forward to the update.

lerno commented 3 years ago

I would like to amend the proposal with a simpler model:

  1. Target type is the LHS, or if the LHS is an unsigned value, the LHS bitwidth + 1 as signed value.
  2. Check all leaf operands so that the type has a smaller or same bit width as the target type. Anything else is a compile time error.
  3. All lead operands are promoted to at least BABW.
  4. All unsigned operands are promoted to bitwidth + 1 as signed value.
  5. Promote both operands to the maximum bit width.
  6. Perform all operations with trapping semantics.
  7. Use a trapping cast to the original LHS

An open question is how casts interact with this. Behaviour in the "complex examples" are same for the above algorithm.

Downside here is that something like u64 + i64 would be promoted to i65 and use > 64 bit arithmetics. In practice, as long as the compiler can prove that only the lower bits are used it can use 64 bit arithmetics. So u64 + i64 is fine, but i64 / (u64 + i64) would need to use i128 instructions.

Because the algorithm uses u64 -> i128 to prevent intermediary value overflow, we have similar issues with pure u64. So u64 / (u64 + u64) would unfortunately use i128 as well.

matu3ba commented 3 years ago

All lead operands are promoted to at least BABW.

What does BABW mean?

Promote both operands to the maximum bit widt

Promote both all operands to the maximum bit widt? Or for each computation? I guess you mean all operands of all operations on the RHS.

An open question is how casts interact with this.

Explicit casts are usually very undescriptive/implicit about behavior (take C++ with 4 different casts and more methods to cast pointers up or down). Rust thinks how to move completely away from them and most languages try to, because the behavior is bad for code review. Explicit casts can for example define 1.truncation of values and 2.conversion between types and you want both to be very explicit and not cluttered in an arithmetic expression.

Because the algorithm uses u64 -> i128 to prevent intermediary value overflow, we have similar issues with pure u64. So u64 / (u64 + u64) would unfortunately use i128 as well.

u64 + u64 can overflow on u64, so only the base "is wasted". Can you think of a better example? Maybe u64 / u64 / u64 or u64 - u32 - u16 - u8 (anything shrinking the possible target value set in each operation)?

  1. It would be very helpful to also be open about the performance costs of trapping and using bigger types and why for example Rust decided against it.
  2. Rust suggested something like this to set them in a painless way, but this is relative verbose. Would you then suggest to use the default of checked(...) in release-safe and unchecked(...) in release-fast?
  3. I feel like a more painless way to annotate the safe and the fast behavior for expressions statements would be more exact (think of var x operator for safe x + y; and var x operator for fast x + y;), because file-, function-, or scope-level clutters the file (the programmer may know and annotate invariants on dynamic program states).
lerno commented 3 years ago

@matu3ba I define BABW as the lowest bit width to perform calculations, typically 32 bits on 32/64 bit targets. 16 bits on 16 bit targets.

Promote both operands to the maximum bit widt? Or for each computation? I guess you mean all operands of all operations on the RHS.

Yes, the left hand side. The change is depth first, casting the leaves will naturally propagate the change upwards. Given a + (b * c) we promote a, b and c. This will naturally ensure that (b * c) is also at least as wide etc.

Explicit casts are usually very undescriptive/implicit about behavior

An example I often take is: uint32var = uint32var + int16update. There is no way to cast int16update to get the correct behaviour in the face of unsigned overflow. There is either: (1) bitcasting, then doing a +% but this bypasses underflow protection (2) extension to signed 64 bits and then a trapping narrowing cast on the result of the add. The latter is the secure one, but I suspect few will use it. Casts often look like people spent some time thinking about them, that is not necessarily true. In C, explicit casting of what actually is implicit casts anyway will mostly give the correct behaviour (comparisons being a notable exception), but that's not necessarily true in other schemes.

It would be very helpful to also be open about the performance costs of trapping and using bigger types and why for example Rust decided against it.

I've read through the correspondence on the Rust mailing-list around 2014. There were many different ideas at the time. I didn't see any proposal for using a bigger type. The cost of trapping was discussed a lot, and ideally many would have preferred traps on both release and debug, but fear of bad performance seemed to have weighted most heavily. There is the "As-if infinitely ranged integer" model (also known as the AIR model) which potentially allowed trapping with delayed checks, so that the performance cost would only be in the range of 6%. However, since this was an academic paper with unclear ramifications, their tight deadline (they were going to 1.0 before the end of the year) made them dismiss this solution.

Some argued for wrapping semantics, but with special trapping overflow operators, but not enough people liked that idea.

lerno commented 3 years ago

In regards to checked(foo + bar) vs foo ~+ bar, I think the first is easier to write, but has more difficulties in terms of transforming a more complex expression.

However I also want to stress that the problem is not just about trapping, but also that unsigned integer overflow has very significant footguns. (1) +/- is no longer associative nor commutative (2) it is very complicated to safely combine unsigned and signed numbers.

This problem becomes more problematic when there is explicit widening:

The below would be valid Zig:

someU32 = someU8 * someU16 + someU32;

But for Rust the widening must be explicit:

someU32 = u32::from(u16::from(someU8) * someU16) + someU32;

Here it's easier to see where potential overflows can occur. Since I need to cast anyway, it's simpler to do:

someU32 = u32::from(someU8) * u32::from(someU16) + someU32;

Which doesn't run into the same risk of overflow. Because the casts are explicit, the unsigned overflow trapping isn't as dangerous. Because the widening is implicit in Zig, it's even less clear where we run into chances of overflow.

matu3ba commented 3 years ago

@lerno Footguns are always bad and not being able to specify local expression behavior with as minimal amount of keypresses and visual clutter is too.

What about defining compiler intrinsics like

someU32 = @op_to_lhs(someU8 * someU16 + someU32);
someU32 = @safe(someU8 * someU16 + someU32);
someU32 = @fast(someU8 * someU16 + someU32);

or adapting grammar like

someU32 =cast_lhs someU8 * someU16 + someU32;
someU32 =safe someU8 * someU16 + someU32;
someU32 =fast someU8 * someU16 + someU32;

or shorter

someU32 =c someU8 * someU16 + someU32;
someU32 =s someU8 * someU16 + someU32;
someU32 =f someU8 * someU16 + someU32;

?

I think the default-semantic of = etc on arithmetic expresssions could be described as fast or am I also wrong on this?

lerno commented 3 years ago

Although you could do this I @matu3ba I think the problem is deeper and is about a more fundamental stance on implicit widening and trapping.

In order to make code secure and bug free the language should try to make the correct way the obvious solution. If we consider the simple u32 = u32 + i16 how do we guide the user to the safe version of that? It's good if there is some language mechanism that just works 95% of all cases and makes the user aware of the remaining 5%.

Vexu commented 2 years ago

This seems like it would fix this footgun I've ran into a few times:

// u1 + u1 will over if both conditions are true instead
// of producing the desired `2`.
var index: u32 = @boolToInt(foo) + @boolToInt(bar);
dansalvato commented 2 years ago

I have a lot of differently-sized types and seem to run into this issue all over the place.

var x: u8 = 0;
var y: u8 = 0;
var index: isize = y * 256 + x;

Gives this error:

error: integer value 256 cannot be coerced to type 'u8'

Fixing this would cut down on a ton of needless manual casting that makes the code less readable.

lerno commented 2 years ago

I wanted to add some more thoughts to this:

If we look at something like (I'm going to use a simplified notation here to just show the types)

i32 = i16 + i8 + i32

This is a very dangerous expression in Zig with today's semantics, because it will be unclear to the reader what the actual semantics are. The above will in code be a = b + c + d. In Rust this would not be allowed because they have different types, but in Zig both widening and arbitrary bitsize addition occurs. What is the intent of the code?

The widening and arbitrary bitsize addition destroys associativity: (i16 + i8) + i32 is not the same as i16 + (i8 + i32). From a reader's point of view seeing a = b + c + d would not seem to be different from a = d + b + c or a = b + d + c, but they have different possible ranges and will trap differently.

To me it was tempting to treat i32 = i16 + i8 + i32 as pushing down the conversion: i32 = @as(i32, i16) + @as(i32, i8) + i32 unfortunately this is not always the right decision. Because what happens now is that changing i8 = i8 + i8 to i32 = i8 + i8 has different semantic. Again we have this semantics change that is hidden from the reader.

To illustrate why this could be bad, consider if the left hand side is a struct that you did not define. If the struct widens the storage, your code changes semantics.

We have safe widenings though: i32 = i16 and i32 = fooReturnsi16() are always safe. And even things like i32 = -i16 is almost safe.

When I say "safe" I mean that there is no way to reorder the widenings for different behaviour. So if we again look at i32 = i8 + i8 we have two possibilities:

  1. Widen after binary evaluation i32 = @as(i32, i8 + i8)
  2. Top down widening i32 = @as(i32, i8) + @as(i32, i8)

Thus it is unsafe, but this is safe: i32 = i32 + i8 as the widening is needed on i8 before it can be added to the i32. And it turns out that A LOT of the widenings that we want are safe ones. So by banning the unsafe ones we actually prevent things from breaking in a hidden way. If i8 = i8 + i8 is changed to i32 = i8 + i8 we have a compilation error, which is exactly what we want. And we still get the widening in cases like i32 = i8 + i32 just not the cases where there is ambiguity.

I've tested this in my language C3 and so far it's been extremely low friction: the cases when you cast are actually when you really need it. To summarize the C3 approach:

  1. Disallow widening unless the operand is a "simple" expression (which typically means it's not a binary expression)
  2. Allow narrowing of constants as long as they fit the target type, e.g. i8 = 126
  3. By default widen to a minimum int width (not strictly necessary)

Previous approaches I've tested and discarded:

  1. No implicit casts (discarded due to being too un-C like)
  2. Bottom up widening (Zig's approach)
  3. Top down widening (e.g. i32 = i8 + i8 => i32 = (i32)i8 + (i32)i8
matu3ba commented 2 years ago

@lerno

Optimal for the register allocator would be to handle things as uniform as possible (case 2) unless becoming bigger than the addressable register size (called word size of the architecture) or if there is register pressure and things are moved on the stack (then case 1 is faster as more stuff fits into L1 cache and thats probably also why Zig chose to use it for now).

As far as I understood the consensus so far Zig does not want to support arbitrary field reordering like C with the sequence point footguns with the argument that the supported use cases by Zig would not have significant perf advantage from that (it is time-consuming to optimize for long expressions anyway). Without sequence points it is natural to assume non-associativity and for most floating stuff the order of writing matters unless you dont care about error (propagation).

So the options that you suggest and my opinion on them are:

  1. We could default to case 1 and forcing the user to set () brackets to specify the order of evaluation (based on the target). This would destroy portability.
  2. We could default to case 2, which will force users expecting high perf to use () brackets/explicit casting or more statements
  3. We could make the case backend or build-script specific, which destroys portability of code (copied code may not do the same)
  4. We could have no implicit casts, which forces users to clutter @as and other casts and lowers ergonomity.
lerno commented 2 years ago

@matu3ba I think you misunderstand my point. There are no "options I suggest". I think that it is clear that neither implicit promotion method works well. The point is not about whether the compiler is allowed to reorder arguments, but rather if the reader understands the semantics of the code and whether the semantics of the code is susceptible to Hidden spooky action at a distance.

My main point is that an expression a = b + c in Zig should never allow implicit widening of b + c to the type of a, if the type of a is wider. Furthermore a = <constant> could always be allowed as long as the constant is known at compile time to be within the range of a, even if the calculated type of is greater than that of a.

What is notable about this is that by disallowing the cast on binary arithmetics the approaches (1) and (2) always yield the same result for the allowed cases.

lerno commented 1 year ago

I close this in favour of #16310