Bad bit negation handling

vtereshkov / umka-lang

Umka: a statically typed embeddable scripting language

BSD 2-Clause "Simplified" License

1.05k stars 53 forks source link

Bad bit negation handling #415

Open skejeton opened 3 months ago

skejeton commented 3 months ago

fn f(v: uint16) {
}

fn main() {
    f(~uint16(0))
}

Runtime error /playground.um (5): Overflow of uint16
Stack trace:
    main: /playground.um (5)
    <unknown>: /playground.um (6)

vtereshkov commented 3 months ago

Yes, all arithmetical and bitwise operators always operate on full-size (64-bit) temporary values, so ~uint16(0) exceeds the uint16 range. You can use ~uint16(0) & 0xFFFF instead.

skejeton commented 2 months ago

Can we just compile it to 0xFFFF ~ num?

vtereshkov commented 2 months ago

@skejeton You essentially mean that Umka should respect the actual operand widths even in expressions. To do it, Umka should apply some widening rules when two operands are of different widths. What would such rules look like? The "least common width" rule? It would have its own adverse effects:

var a: uint8 = 200
var b: uint16 = 65500
var c: uint = a + b      // Overflow or what?

var a: uint8 = 0
var b: uint16 = 0
var c: bool = ~a == ~b   // true or false?

skejeton commented 2 months ago

@skejeton You essentially mean that Umka should respect the actual operand widths even in expressions. To do it, Umka should apply some widening rules when two operands are of different widths. What would such rules look like? The "least common width" rule? It would have its own adverse effects:
var a: uint8 = 200
var b: uint16 = 65500
var c: uint = a + b      // Overflow or what?
var a: uint8 = 0
var b: uint16 = 0
var c: bool = ~a == ~b   // true or false?

I think it should only apply to unary operations, which one of them is negation.

vtereshkov commented 2 months ago

And so ~a != ~b in my second example, even though both a and b are zero, right?

skejeton commented 2 months ago

And so ~a != ~b in my second example, even though both a and b are zero, right?

Good question. I think so. I can't think of any practical reason why it shouldn't be this way.

vtereshkov commented 2 months ago

In a perfect world, if a == b, I could substitute a for b in any expression (e.g., ~b) and get the same result. In reality, I'm afraid, I will never achieve this unless I forbid implicit type casts.

skejeton commented 2 months ago

Yeah, I'm not sure, C does behave as you say: https://godbolt.org/z/cqPh6nEKq But the promotion rules are different, from what I know it promotes to the highest bit type in the expression instead of straight up uint, and of course it doesn't do overflow checks. Personally I'm not against either idea.

skejeton commented 2 months ago

An idea I have is to have some sort of context in expression, where if you explicitly cast to uint16, it will only promote as high as uint16, because casting anything higher than uint16 wouldn't make a lot of sense.

vtereshkov commented 2 months ago

@skejeton

it promotes to the highest bit type in the expression instead of straight up uint

C is weird. It always promotes everything to int, but not to int64_t, unless one of the operands is already int64_t.

The standard says:

If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.

And a comment from Stack Overflow:

The harsh reality caused by the integer promotions means that almost no operation in C can be carried out on small types like char or short. Operations are always carried out on int or larger types.

Now let's see what Compiler Explorer will do:

    uint8_t a = 200;
    uint16_t b = 0xFFFF;
    uint64_t c = a + b;
    printf("%d\n", c);       // 65735 - no overflow

    uint8_t a = 200;
    uint32_t b = 0xFFFFFFFF;
    uint64_t c = a + b;
    printf("%d\n", c);       // 199 - overflow

skejeton commented 2 months ago

You're right. I think to make the semantics less complicated/arbitrary, in an expression that expects uint16, cast to uint16 at most, and so on.