`echo -0x80'i8` prints 128 (which is out of range) and other bugs

timotheecour commented 3 years ago

Example

when defined case5d:# gitissue D20210703T101319:here
  echo -128'i8
    # GOOD: prints -128 in 1.5.1; parsed as: int8(-128)
    # was CT error in 1.4 which was parsed as: - (128'i8)
  echo 0x80'i8 # -128
  echo "bugs:"
  echo -0x80'i8 # BUG: prints 128 which is out of range for int8
  const a1 = -0x80'i8
  echo a1
  echo ($a1, a1, $type(a1))

  echo "ok:"
  let a2 = a1
  echo a2
  echo ($a2, a2, $type(a2))

  echo "ditto with other types:"
  const a = -0x8000'i16
  echo (a, $a, int16.low, int16.high)

ditto with -0o200'i8

Current Output

-128
-128
bugs:
128
128
("128", -128, "int8")
ok:
-128
("-128", -128, "int8")
ditto with other types:
(-32768, "32768", -32768, 32767)

Expected Output

-128 -128 bugs: CT error for -0x80'i8 and similar literals

Possible Solution

1.5.1 3ceaf5c1309ac8d4d68e6f39c13b021bcc1b15f4

-128'i8 is useful and it's good we now support it.
-0x80'i8 is an outright bug and that's the subject of this issue, the lexer should reject it as -128 doesn't have a negative that fits as int128; and as seen above leads to inconsistencies
-0x81'i8 is questionable as it allows 2 writings for the same number, 0x7f'i8 and -0x81'i8

proposal

make parser give CT error for literals (hex,oct,binary) with a minus sign that result in a >=0 number: these would become CT errors; they're error prone and serve no purpose:

  echo -0x80'i8
  echo -0x81'i8
  echo -0x8000'i16
  echo -0x8001'i16
  echo -0o200'i8
  echo -0o201'i8
  # etc

ringabout commented 3 years ago

Slightly related: https://github.com/nim-lang/Nim/issues/4350 ("The range of hexadecimal number")

What Go/Rust does is just like strtol function in C

package main

import (
    "fmt"
    "strconv"
)

func main() {
    j, err := strconv.ParseInt("-80", 16, 8)
    fmt.Println(j)
    fmt.Println(err)

    fmt.Println(int8(-0x80)) // 128
    fmt.Println(int8(0x80))  // overflow

    fmt.Println(-0x8000000000000000)
    fmt.Println(0x8000000000000000) // constant 9223372036854775808 overflows int
    fmt.Println(0x8000000000000001) // ./prog.go:18:14: constant 9223372036854775809 overflows int

}

Rust

pub fn main() {
    println!("num: {}", 0x80i8);
}

error message:

error: literal out of range for `i8`
 --> src/main.rs:2:25
  |
2 |     println!("num: {}", 0x80i8);
  |                         ^^^^^^ help: consider using the type `u8` instead: `0x80u8`
  |
  = note: `#[deny(overflowing_literals)]` on by default
  = note: the literal `0x80i8` (decimal `128`) does not fit into the type `i8` and will become `-128i8`

Maybe the lexer can work like strtol in C stdlib, follow that convention. Instead of making -0x80'i8 a CT error, let the lexer not interpret the high bit of hexadecimal as sign bit. Then the lexer of hexadecimal is more consistent with that of plain number.

Expected

0x7f'i8 => 127 -0x80'i8 => -128 0x80'i8 => overflow

Pros

No more redundancy(Before there are two ways to represent a negative number)
follow c conventions
make all base of number consistent(no sign bit), as a bonus the lexer can reuse a proc(for instance a modified version of strtol). https://code.woboq.org/gcc/libiberty/strtol.c.html#7cutoff

nim-lang / Nim