nim-lang / Nim

Nim is a statically typed compiled systems programming language. It combines successful concepts from mature languages like Python, Ada and Modula. Its design focuses on efficiency, expressiveness, and elegance (in that order of priority).
https://nim-lang.org
Other
16.39k stars 1.47k forks source link

Weird unsafe behaviour with enums with holes #13980

Open johnnovak opened 4 years ago

johnnovak commented 4 years ago

Root problem: for enums with holes, it is easily possible to create instances with values from the "hole regions" without runtime errors. These enums will happily pass through case statements when matching on them as no-ops. The complete lack of error checking in this case could lead to subtle bugs.

There are a couple of other points following from this root issue, as explained below.

Example

type SomeEnum = enum
  First = 1,
  Second = 5,
  Third = 10

proc match(e: SomeEnum) =
  case e
  of First:  echo "first"
  of Second: echo "second"
  of Third:  echo "thirds"
  # else: echo "Invalid enum value"
  #
  # ^ The compiler resists the attempt to handle invalid enum values
  #   (which would be correct IF there was no possibility of creating
  #   invalid enums in the first place)
  #
  # "Error: invalid else, all cases are already covered"

# This would result in an error, so far so good:
# Error: conversion from int literal(100) to SomeEnum is invalid
#f = SomeEnum(100)

# But this compiles and then prints this:
# 2 (invalid data!)
var f = SomeEnum(2)
echo f

match(f)  # no error, basically a NOP

var g: SomeEnum
match(g)  # no error, basically a NOP

# Nice try, but the compiler says no: "Error: 'of' takes object types"
#echo f of SomeEnum

Current Output

2 (invalid data!)

Expected Output

A compilation error for the statement var f = SomeEnum(2).

Possible Solution

I can't think of a way to solve this properly in a fully backwards-compatible way. Here's two ideas to remedy the issue somehow, though:

  1. Introduce a new "strict enums" compile time flag to disallow the creation of enums from the hole regions. The flag would be off by default, then we could turn it on by default in the next major Nim release.

  2. Leave the current broken method alone and introduce a new way of converting numbers to enums (e.g. SomeEnum.fromValue()) that can throw a ValueError for undefined numbers.

For both solutions, we would also need a way to check before instantiation at runtime whether a given number is defined for a given enum type (e.g. SomeEnum.isValid(). Right now, the only way of doing this check (that I'm aware of) is to instantiate the enum, convert it to a string and check it it ends with (invalid data!). That's a bit hacky.

Additional Information

% nim -v
Nim Compiler Version 1.2.0 [MacOSX: amd64]
Compiled at 2020-04-03
Copyright (c) 2006-2020 by Andreas Rumpf

git hash: 7e83adff84be5d0c401a213eccb61e321a3fb1ff
active boot switches: -d:release
cooldome commented 4 years ago

This should be a compile time error:

var f = SomeEnum(2)
johnnovak commented 4 years ago

This should be a compile time error:

var f = SomeEnum(2)

Yeah, but we also need the runtime checks (e.g. to check whether a given value is allowed before instantiation and probably a runtime error if you managed to get hold of an invalid enum somewho via a cast... but not sure if that's even possible now, haven't checked).

alaviss commented 4 years ago

We should ban enum with holes to be fair. They are rather useless in Nim, and the only reason anyone would use them is for C interfacing, which you can already do with a combination of distinct cint and const (and also a much much better way to interface since C bitflags are just const with a custom type name stamped on).

Araq commented 4 years ago

Yeah, enums with holes are simply madness. We should deprecate them for 1.4.

mratsim commented 4 years ago

c2nim should be updated to not generate them then

alaviss commented 4 years ago

@Araq does custom pragma work for types yet? We can probably use that to implement something like {.cenum.} to allow writing C-style enum for FFI.

johnnovak commented 4 years ago

the only reason anyone would use them is for C interfacing

Well, I'm using them and I'm not interfacing with C code :) I'm just putting holes into some of my enum definitons so when I extend them with newer values I don't have to convert my storage file format every time... It's a bit like using lines 10, 20, 30 in basic so you can chuck in 11 or 15 later more easily :) But admittedly I could live without this feature (hack?).

Also, Java (and probably other high-level languages) have similar such enums where you can give the enum items arbitrary values. So, I think it's a nice to have if it can fit into the rest of the language. I'm just saying don't ban it outright in the next version, maybe think a bit more about it, they have their uses.

Vindaar commented 4 years ago

I don't know, I really like enums with holes for some purposes. Essentially, for cases where I only have a small number N of elements with integer values, which I would otherwise put into a table if I had more of them. But an enum is so much more handy than a table, since I can case on it more easily. Sure I can do case myTab["key"] if its an ordinal element too, but then the values in my of branches are either just random numbers or I also have to define every possible value as a constant, which I can use as of PossibleValue1 etc.

For instance in the work of my current master's student I told here to use them to represent all elements that are part of the solar model she's looking at:

https://github.com/jovoy/AxionElectronLimit/blob/master/readOpacityFile.nim#L17-L42

I'd rather keep enums with holes, but pay a performance penalty when using them to make them safer, than not having them in the first place.

johnnovak commented 4 years ago

I'd rather keep enums with holes, but pay a performance penalty when using them to make them safer, than not having them in the first place.

+1 from me on this one, I find enums with holes useful too.

Araq commented 4 years ago

For me it's one of Nim's harmful features, hard to implement and it destroys other parts of the language like "oh I cannot use lookupTable: array[E, string] because E has holes, ffs"

johnnovak commented 4 years ago

Yeah, if it's too much trouble, then probably not worth keeping it. As I said, nice to have but not essential (I think).

mratsim commented 4 years ago

Alternatively they can be made into a separate type which will have a restricted set of procedures.

Vindaar commented 4 years ago

What I wrote in my last comment gave me an idea whether enums with holes could just be imitated with a macro. I just tried out my idea and it mostly works. See the proof of concept here:

import macros, tables, unittest

const TabSuffix = "TabImpl"
const TypeSuffix = "Type"

{.experimental: "caseStmtMacros".}

macro holyEnum*(name, body: untyped): untyped =
  echo body.treeRepr
  var curly = nnkTableConstr.newTree()
  var constants = nnkConstSection.newTree()
  let intTypeName = ident(name.toStrLit.strVal & TypeSuffix)
  let equalIdent = ident("==")
  var typeSect = quote do:
    type
      `name` = object
      `intTypeName` = distinct int
    proc `equalIdent`(x, y: `intTypeName`): bool {.borrow.}
      # and the distinct type for the table type
  let tabName = ident(name.toStrLit.strVal & TabSuffix)
  for ch in body:
    expectKind(ch, nnkAsgn)
    let rhsVal = ch[1]
    let rhsCurly = quote do:
      `rhsVal`.`intTypeName`
    constants.add nnkConstDef.newTree(ch[0],
                                      intTypeName,
                                      rhsCurly)
    curly.add nnkExprColonExpr.newTree(ch[0], rhsCurly)
  echo constants.repr
  echo curly.repr
  let matchIdent = ident"match"
  result = quote do:
    `typeSect`
    `constants`
    const `tabName` = `curly`.toTable
    ## unfortunately the below doesn't match
    #macro `matchIdent`(n: `intTypeName`): untyped =
    #  echo n.treeRepr
  echo result.treerepr
  echo result.repr

macro getTabImpl(t: typed): untyped =
  let typeName = t.getTypeInst.toStrLit.strVal
  let ident = ident(typeName & TabSuffix)
  result = quote do:
    `ident`

macro getSubType(t: typed): untyped =
  let tabType = t.getTypeInst
  # take argument of tab. Table is always `Table[T, T]` anyways
  doAssert eqIdent(tabType[1], tabType[2])
  result = tabType[1]
  echo result.repr

macro getTypeName(t: typed): untyped =
  result = t.getTypeInst.toStrLit

template parseHolyEnum[T](val: untyped): untyped =
  ## TODO: avoid copy?
  let tabImpl = getTabImpl(T)
  type tabSubType = getSubType(tabImpl)
  var res: tabSubType
  #if val in tabImpl:
  if tabSubType(val) in tabImpl:
    res = tabSubType(val)
  else:
    raise newException(ValueError, "Enum value of " & $val & " invalid for " &
      "holy enum of type " & $getTypeName(T))
  res

holyEnum(MyHolyEnum):
  field1 = 1
  field2 = 4
  field3 = 5
  field4 = 10

let val = parseHolyEnum[MyHolyEnum](4)
echo type(val)
expect(ValueError):
  let invalid = parseHolyEnum[MyHolyEnum](2)

case val
of field1:
  echo "IS ONE"
of field2:
  echo "IS TWO"
of field3:
  echo "IS THREE"
of field4:
  echo "IS FOUR"
else: discard

It generates the following code:

type
  MyHolyEnum = object
  MyHolyEnumType = distinct int
proc ==(x`gensym28185291, y`gensym28185292: MyHolyEnumType): bool {.borrow.}
const
  field1: MyHolyEnumType = 1.MyHolyEnumType
  field2: MyHolyEnumType = 4.MyHolyEnumType
  field3: MyHolyEnumType = 5.MyHolyEnumType
  field4: MyHolyEnumType = 10.MyHolyEnumType
const
  MyHolyEnumTabImpl = {field1: 1.MyHolyEnumType, field2: 4.MyHolyEnumType,
                     field3: 5.MyHolyEnumType, field4: 10.MyHolyEnumType}.toTable

Essentially the idea is to create constants for the enum values and a constant table, which holds the values. Based on the given name a distinct int type is created.

This currently has 2 downsides, as far as I can see:

  1. I don't think it's possible to create these in a type section
  2. a case statement casing on the generated int type must still have an else: discard branch, even though it's "not possible". I tried to work around this with case statement macros, but they don't match by distinct int type.

Otherwise this could replace enums with holes for me I believe.

timotheecour commented 4 years ago

see https://github.com/nim-lang/Nim/pull/14008 instead

johnnovak commented 4 years ago

see #14008 instead

I think we have a winner here.