nim-lang / Nim

Nim is a statically typed compiled systems programming language. It combines successful concepts from mature languages like Python, Ada and Modula. Its design focuses on efficiency, expressiveness, and elegance (in that order of priority).
https://nim-lang.org
Other
16.57k stars 1.47k forks source link

ComputedGoto: bad codegen with enum with holes #7699

Closed mratsim closed 6 years ago

mratsim commented 6 years ago

Taking the test here: https://github.com/nim-lang/Nim/blob/ddc131cf07972decc206e033b2dd85a42eb1c98d/tests/casestmt/tcomputedgoto.nim and changing the enum to enum with holes breaks codegen:

discard """
  output: '''yeah A enumB
yeah A enumB
yeah CD enumD
yeah CD enumE
yeah A enumB
yeah CD enumE
yeah CD enumD
yeah A enumB
yeah B enumC
yeah A enumB
yeah A enumB
yeah A enumB'''
"""

type
  MyEnum = enum
    enumA = 0x0000
    enumB = 0x0100
    enumC = 0x0101
    enumD = 0x0200
    enumE = 0x0300
    enumLast = 0x0900

proc vm() =
  var instructions: array[0..100, MyEnum]
  instructions[2] = enumC
  instructions[3] = enumD
  instructions[4] = enumA
  instructions[5] = enumD
  instructions[6] = enumC
  instructions[7] = enumA
  instructions[8] = enumB

  instructions[12] = enumE
  var pc = 0
  while true:
    {.computedGoto.}
    let instr = instructions[pc]
    let ra = instr.succ # instr.regA
    case instr
    of enumA:
      echo "yeah A ", ra
    of enumC, enumD:
      echo "yeah CD ", ra
    of enumB:
      echo "yeah B ", ra
    of enumE:
      break
    of enumLast: discard
    inc(pc)

vm()
.../nimcache/compgoto.c:92:21616: error: use of undeclared label 'TMP1894_'
.../nimcache/compgoto.c:92:26536: error: use of undeclared label 'TMP2304_'
.../nimcache/compgoto.c:92:2187: error: use of undeclared label 'TMP209_'
.../nimcache/compgoto.c:92:14344: error: use of undeclared label 'TMP1288_'
.../nimcache/compgoto.c:92:6004: error: use of undeclared label 'TMP556_'
.../nimcache/compgoto.c:92:26524: error: use of undeclared label 'TMP2303_'
.../nimcache/compgoto.c:92:2198: error: use of undeclared label 'TMP210_'
.../nimcache/compgoto.c:92:14356: error: use of undeclared label 'TMP1289_'
.../nimcache/compgoto.c:92:5993: error: use of undeclared label 'TMP555_'
.../nimcache/compgoto.c:92:14368: error: use of undeclared label 'TMP1290_'
.../nimcache/compgoto.c:92:21604: error: use of undeclared label 'TMP1893_'
.../nimcache/compgoto.c:92:26512: error: use of undeclared label 'TMP2302_'
.../nimcache/compgoto.c:92:5971: error: use of undeclared label 'TMP553_'
.../nimcache/compgoto.c:92:26560: error: use of undeclared label 'TMP2306_'
.../nimcache/compgoto.c:92:5982: error: use of undeclared label 'TMP554_'
.../nimcache/compgoto.c:92:2154: error: use of undeclared label 'TMP206_'
.../nimcache/compgoto.c:92:14380: error: use of undeclared label 'TMP1291_'
.../nimcache/compgoto.c:92:14392: error: use of undeclared label 'TMP1292_'
.../nimcache/compgoto.c:92:2176: error: use of undeclared label 'TMP208_'
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
LemonBoy commented 6 years ago

There are two possible solutions:

What's the best one for y'all?

Araq commented 6 years ago

Raise an error if the enum has holes

IMHO that's the way to go. If you want speed, design your enums properly.

mratsim commented 6 years ago

I think it should raise an error as well, this is not supported on the C side.

Regarding enum design, the opcode <-> hex/enum value table is most likely not in the hand of people writing the implementation, and most designs I've seen have holes.

Fortunately it is not hard to come up with a macro to fill holes, this is the one I am using.

import macros, strformat, strutils

macro fill_enum_holes*(body: untyped): untyped =
  ## Fill the holes of an enum
  ## For example
  ##   type Foo {.pure.} = enum
  ##     A = 0x00,
  ##     B = 0x10

  # Sanity checks
  #
  # StmtList
  #   TypeSection
  #     TypeDef
  #       PragmaExpr
  #         Postfix
  #           Ident "*"
  #           Ident "Op"
  #         Pragma
  #           Ident "pure"
  #       Empty
  #       EnumTy
  #         Empty
  #         EnumFieldDef
  #           Ident "A"
  #           IntLit 0
  #         EnumFieldDef
  #           Ident "B"
  #           IntLit 16
  body[0].expectKind(nnkTypeSection)
  body[0][0][2].expectKind(nnkEnumTy)

  let opcodes = body[0][0][2]

  # We will iterate over all the opcodes
  # check if the i-th value is declared, if not add a no-op
  # and accumulate that in a "dense opcodes" declaration

  var
    opcode = 0
    holes_idx = 1
    dense_opcs = nnkEnumTy.newTree()
  dense_opcs.add newEmptyNode()

  # Iterate on the enum with holes
  while holes_idx < opcodes.len:
    let curr_ident = opcodes[holes_idx]

    if curr_ident.kind in {nnkIdent, nnkEmpty} or
      (curr_ident.kind == nnkEnumFieldDef and
      curr_ident[1].intVal == opcode):

      dense_opcs.add curr_ident
      inc holes_idx
    else:
      dense_opcs.add newIdentNode(&"Nop0x{opcode.toHex(2)}")

    inc opcode

  result = body
  result[0][0][2] = dense_opcs

It can be added to sugar.nim.

Araq commented 6 years ago

Regarding enum design, the opcode <-> hex/enum value table is most likely not in the hand of people writing the implementation, and most designs I've seen have holes.

Come on, when you read the bytecode from disk or whatever, you convert it to the different enum representation. This lookup table can also be generated by a macro if the mapping is obvious enough.