nim-lang / RFCs

A repository for your Nim proposals.
136 stars 26 forks source link

Type checking for nil dereference: nilable and not-nilable types #250

Closed alehander92 closed 3 years ago

alehander92 commented 3 years ago

Type checking for nil dereference: nilable and not-nilable types

Abstract

Check nilability of ref-like types and make dereferencing safer based on flow typing and not nil annottions.

Note

Note: the original version of this RFC was similar but it made not nil a default: a version of it can be found in the file of the PR https://github.com/nim-lang/RFCs/pull/169 and please also take a look at history of the RFC section. All of the github comments and reactions there are for the first RFC version.

Motivation

Null errors are ones of the most common ones. Modern languages strive to either use Options/similar types or to at least make handling null values safer. Compile time checking is useful to prevent null dereference before running the program.

In Nim we can try to do that in many situations for ref-like types. However we need to preserve compatibility for some time, so we can start with having warnings which can be turned into errors by the user.

Description

Nim already has some not nil support, but the plain is to use here a different version of it.

Ref types are nil by default: this means they can include nil by default. They can be annotated to forbid including nil with the not nil annotation.

We use flow-sensitive typing to check nilability.

  type
    NilableObject = ref object
      a: int
    Object = NilableObject not nil

    Proc = (proc (x, y: int))

  proc p(x: Object) =
    echo x.a # ensured to dereference without an error

  # compiler catches this:
  p(nil)

  # and also this:
  var x: NilableObject
  p(x)

If a type can include nil as a valid value, dereferencing values of the type is checked by the compiler: if a value which might be nil is derefenced, this produces a warning by default, you can turn this into an error using the compiler options. TODO

You can still turn off nil checking on function/module level by using a {.nilCheck: off}. pragma.

If a type is nilable, you should dereference its values only after a isNil or equivalent check, e.g.:

  proc p(x: NilableObject) =
    if not x.isNil:
      echo x.a

    # equivalent
    if x != nil:
      echo x.a

  p(x)

Safe dereferencing can be done only on certain locations:

Dereferencing operations: look at [Reference and pointer types], for procedures: calling is such an operation.

It's enough to ensure that a value is not nil in a certain branch, to dereference it safely there: the language recognizes such checks in if, while, case, and, or

e.g.

  not nilable.isNil and nilable.a > 0

is fine.

case can be used as well

  case a.isNil:
  of true:
    echo a.a # error
  of false:
    echo 0

However, certain constructs invalidate the value not-nil-ness.

  if not nilable.isNil:
    nilable.a = 5 # OK
    var other = 7 # OK
    echo nilable.a # OK
    call() # maybe sets nilable to `nil`?
    echo nilable.a # warning/error: `nilable` might be nil

If we do a check in a e.g. if, the other branches (e.g. else) assume the opposite fact about the nilability of a value.

  if a.isNil:
    echo 0
  else: # a is not nilable
    echo a.a

Additional check is that the return value is also not nil, if that's expected by the return type

  proc p(a: Nilable): Nilable not nil =
    if not a.isNil:
      result = a # OK
    result = a # warning/error

Early return after nil check is ok: the behavior is the same as if the remaining code was in else

  if a.isNil:
    return
  a[] # ok

When two branches "join", a location is still safe to dererence, if it was not-nilable in the end of both branches, e.g.

  if a.isNil:
    a = Object()
  else:
    echo a.a
  # here a is safe to dereference

Dereferencing compound expressions

We should try to support more expressions: maybe most common are dot expressions , a.b.c type of expressions as well.

if not a.b.c.isNil:
  echo a.b.c.field # OK

We need to ensure the expression hasn't been changed meanwhile: so we should detect invalidation of the non-nilability on

Supporting bracket expressions a[b] would be useful as well with similar limitations.

Initialization of non nilable pointers

The compiler ensures that every code path initializes variables which contain non nilable pointers.

It seems this is already done in the compiler using tfRequiresInit ?

Not nil ref default values and usage in collections

Default values of ref T not nil is nil. This can lead to holes in the type safety: that's why we still have to handle some cases.

seq[T] where T is ref and not nil are an interesing edge case: they are supported with some limitations.

They can be created with only some overloads of newSeq:

newSeq(length): default for ref T not nil returns nil, so the programmer is responsible to fill correctly the sequence. However this should produce a warning (which can be turned off).

However this should be used only in edge cases.

There is special treatment of setLen related functions as well: one can use shrink in all cases. However one can use grow similarly to newSeq :

grow(length): calls default: expects that the programmer fills the new elements with non nil values manually.

Similar logic applies to arrays, tables and other collections that can use default values.

Implementation

We plan on using the AST for now. We should keep track of nilability(possibly with additional state) and "dependencies" of compound expressions.

Backward incompatibility

We don't show errors by default to be backward compatible. However this might be a good idea in Nim 2.0 .

We should use warnings which can be disabled. Old programs with those warnings should keep compiling by default.

Future development

Eventually we can extend this mechanism to types represented by finite sets : e.g. enums and base variant support on the enum check. I might write a RFC for that, but it's possible that such a feature would use the z3 engine support or at least the requires pragmas.

History of the RFC

A version of the first variant of the RFC issue included this text:

WIP for nilability spec for Nim : typechecking ref-s: please read the spec in the nilcheck.rst file of the pull request(it's rendered better at https://github.com/alehander42/RFCs/blob/master/RFCs/nilcheck.rst ), and comment somewhere here on the proposed syntax/semantics

ideas that probably won't make it through: https://github.com/alehander42/Nim/blob/nilcheck-spec/doc/nilcheck_maybe.rst

based on irc conversations with Araq (and a previous AST-based implementation PR inspired by conversations with zah and Herb Sutter's C++ related Lifetime paper, this is much more limited tho: just focused on nilability and ref types)

and the text of the RFC as a file in the PR https://github.com/nim-lang/RFCs/pull/169

Credits

Based on irc conversations with Araq, conversations with zah and Herb Sutter's C++ related Lifetime paper(this is much more limited tho: just focused on nilability and ref types), and experience with this feature in other languages (e.g. Typescript, C#, Zig and others).

alehander92 commented 3 years ago

should we keep the nil T syntax? it seems useful for unexported types or generics

alehander92 commented 3 years ago

we decided to keep the syntax nil T iirc

Araq commented 3 years ago

We now have an implementation and bugs will be fixed just like other bugs.