nim-lang / RFCs

A repository for your Nim proposals.
135 stars 26 forks source link

Definite assignment analysis and out parameters #378

Closed Araq closed 1 year ago

Araq commented 3 years ago

Proposal to add out parameters to Nim and to enable Nim's "definite assignment analysis" (https://docs.oracle.com/javase/specs/jls/se6/html/defAssign.html) for all types.

Current situation

This code compiles without warnings:


proc p(): seq[int] =
  result.add 4

Variables (including result) are initialized to their default value automatically. This design has its merits but it is also considered to be error prone:


proc p(cond: bool): ref T =
  if cond:
    result = (ref T)()
  # implied: else: result = nil

It's hard to know if the "implied" statement was meant and I think it's the most frequent source of nil bugs. Nim's original design lacked system.default(T) so initializing variables to default(T) implicitly made some sense. Another benefit was this solution does not require an analysis over the control flow graph to ensure that no uninitialized values can be accessed.

But for some types like var T or ptr T not nil there is no default value -- variables of such a type need to be initialized explicitly.

Proposal: Require explicit initialization

This RFC proposes to remove the special casing of variables that lack a default value -- all variables need to be initialized explicitly before they can be used. This is what C# and Java (see https://docs.oracle.com/javase/specs/jls/se6/html/defAssign.html) do for decades and it works very well.

Proposal: Add out parameters to Nim

This proposal was implemented and found to not work too well with today's Nim. The reason is that code like:


proc open(f: var File, filename: string): bool

var f: File
if open(f, "t.txt"):
  ...

is very common. The var File here is an output parameter, not a read-write parameter. This is why I propose to add out T parameters to Nim. While output parameters are usually bad style, APIs from Windows and POSIX use output parameters heavily.

When the implementation of a routine with output parameters is analysed, the compiler checks that every path before the (implicit or explicit) return does set every output parameter:


proc p(x: out int; y: out string; cond: bool) =
  x = 4
  if cond:
    y = "abc"
  # error: not every path initializes 'y'

Out parameters and exception handling

This analysis takes exceptions into account which can produce annoying results:


proc p(x: out int; y: out string; cond: bool) =
  x = canRaise(45)
  y = "abc" # <-- error: not every path initializes 'y'

However, I think this is acceptable for two reasons:

Out parameters and inheritance

It is not valid to pass an lvalue of a supertype to an out T parameter:


type
  Superclass = object of RootObj
    a: int
  Subclass = object of Superclass
    s: string

proc init(x: out Superclass) =
  x = Superclass(a: 8)

var v: Subclass
init v
use v.s # the 's' field was never initialized but the compiler does not understand this!

However, in the future this could be allowed and give us a better way to write object constructors that take inheritance into account.

Code breakage / Transition period

Code like the following var x: T; use x is not allowed anymore. As a long transition period the compiler will warn about the "usage before definition" and translate it as if var x = default(T); use x was written.

Future extensions

Object fields that are declared as field: T must be explicitly initialized in object constructors, to get a default value for an object field the syntax field = default(T) should be used.

Alternatives

  1. Keep the language as it is, it's fine.
  2. Enable definite assignment analysis for all types but without adding out parameters.
Clyybber commented 3 years ago

Output parameters should be used rarely. It's better style to return a tuple of values instead.

Then why introduce such complexity for a feature that would only be used rarely anyways?

IMO code like

proc p(): seq[int] =
  result.add 4

should keep working without warnings. I think we could come up with a more nuanced analysis that warns for:

proc p(b: bool): seq[int] =
  if b:
    result = @[4]
  else:
    echo "hmmm, did I forget to assign result here?"

but not for the previous snippet.

juancarlospaco commented 3 years ago

Can it produce a warning for var T and no need for out T ?. Even if I have to add a {.used.} or similar to my var T to explicitly silent the warning about it. ptr should be work of strictNotNil as best as it can.

I think that for 2.0 we need to reduce the amount of experimental switches, not duplicate it. We have 2 "not nil" thingies etc, a natural selection must happen.

haxscramper commented 3 years ago

I believe making default value assignment mandatory would break quite a lot of code, especially templates/macros and introduce unnecessary verbosity in the language. When I write var matching: seq[int] or maxItem: MyTypeThatIKnowIsFineToDefault I don't think it is necessary to require explicit initialization. At the same time, idea of rewriting var t: T as var t: T = default(T) is exactly what was proposed in https://github.com/nim-lang/RFCs/issues/252, with only difference for default-value-creator signature: init=(var T) vs default(typedesc[T]). It is already possible to override default so

translate it as if var x = default(T); use x was written.

Seems like a good idea, but forcing initialization for var is not. Warnings - maybe, but not hard errors.

type
  CustomType = object
    field: string

proc default(T: typedesc[CustomType]): T =
  # Same is `init=` in my origina proposal
  CustomType(field: "sfasdf")

# `var c: CustomType` is translated into
var c = default(CustomType)
echo c
planetis-m commented 3 years ago

Data structures using Locks and Atomics need this. https://groups.google.com/g/comp.unix.programmer/c/qgYLhHquTUg?pli=1 https://stackoverflow.com/questions/3269266/ok-to-copy-a-critical-section https://stackoverflow.com/questions/27790116/c11-thread-safety-of-stdatomict-copy-constructors

xigoi commented 2 years ago

Whether or not this is accepted, I think a function that doesn't mention result and doesn't directly return a value (that is, it always returns the default value) should be an error. Forgetting to return is very common and currently the compiler will just silently do the wrong thing.

Clonkk commented 2 years ago

I agree with @haxscramper here :this seems very much like https://github.com/nim-lang/RFCs/issues/252

Whether you call it =init or =default having a an initialization hook is necessary; initialization based on zeroMem has been the cause of many headache when handling C++ type.

Araq commented 1 year ago

This has been implemented is available under .experimental: "strictDefs"