nim-lang / RFCs

A repository for your Nim proposals.
135 stars 26 forks source link

Object lifetime states #523

Closed Sentmoraap closed 10 months ago

Sentmoraap commented 1 year ago

Abstract

Expand upon definite assignment analysis, track object lifetime states at compile-time. Objects can be in three states: invalid, unnasigned and assigned. What you can do with an object depends on it’s state.

Motivation

Definite assignment analysis is a great step forward in tracking lifetimes, but by generalizing the system it could cover more use cases and allow new patterns. This system aims to keep Nim’s strength of being at the same time convenient to program in and a fast language.

Currently you can’t have an object with an invalid state that should not be destroyed. The following code zero-initializes a {.requiresInit.} object and calls the destructor on it:

{.experimental: "strictDefs".}

type
  T {.requiresInit.} = object
    i: int

proc `=destroy`(x: var T) =
  echo "destroyed ", x.i
  x.i = -1

proc someCondition(): bool =
  return true

proc useT(x: sink T) =
  discard

proc test() =
  var t = T(i: 42)
  if someCondition():
    useT(t)

test()

# Outputs:
# destroyed 42
# destroyed 0

{.requiresInit.} types must be initialized when they are declared. Like with RFC #495, it would be nice to be able to initialize a {.requiresInit.} object later.

The use of moved-from objects does not happen often because of checks if it’s the last read, but it can happen. Use of unreinitialized moved-from objects should be considered the same mistake as using an uninitialized object. Currently definite assignment analysis allows it:

var f: Foo
f.i = 42
if someCondition():
  useFoo(move(f))
echo "f.i = ", f.i
# Displays f.i = 0, no warning

Currently the compiler accepts adding {.error.} to a destructor, but segfaults when trying to call one. If the language officialy supports deleting destructors then we can implement Vale’s higher RAII.

Description

An object can be in three states:

Valid = initialized ∪ assigned. Can be used. Unassigned = invalid ∪ initialized. It does not need to be destroyed.

The compiler tracks in which states the object can be, and if it’s uses are allowed in each state it can be. An owned object must leave the scope unassigned. The compiler can add a destructor call for that. If it’s not possible it’s an error.

Because of conditionals and control flow at a given line an object can possibly be in multiple states.

For let objects the compiler also tracks if it can be assigned, to enforce that it can be assigned only once.

Currently =copy and =sink hooks assumes that the target object is in a valid state and must destroy it first.

New newCopy and newSink hooks assumes that the target object is unassigned. To avoid something similar to C++’s rule of three/five/zero, the compiler can add destructor calls and call newCopy or newSink when =copy or =sink are not provided. The latter two remain useful to allow the target object to reuse it’s resources. Because newCopy can’t check for self-assignments before self-destruction, the compiler has to add the check.

When the destructor does nothing it can be called on an invalid object so it works on maybe invalid maybe assigned objects. It also removes the need to check for self-assignments.

Non-owned objects are assumed to be in a valid state. out parameters are assumed to be unassigned. ptr can point to objects in any state.

You can make the compiler assume that the object is in a specific state, or that it’s the first assignment. Possible syntax:

var a: int
let b: int
for i in 0..10:
  if someCondition(i): # You know that it happens at least once
    a = i
  if someOtherCondition(i): # You know that it happens once and only once
    {.firstAssignment.} b = i
assumeAssigned(a)
echo "a = ", a

When the compiler cannot ensure this is true at compile-time, it adds run-time checks in debug and release builds. A wrong assumption is undefined behavior in danger builds.

As an optional proposal, destructors could be removed to force the programmer to do “destroy” it with a move. However it would need more discussion and be a separate RFC.

Backwards compatibility

Some code that compiles now would not compile with those changes: some programs would attempt to read {.requiresInit.} objects in the invalid state. A {.requiresInit.} object can also end up in a maybe invalid maybe assigned state, so the compiler cannot add a destructor call but cannot do nothing with that object.

Types that uses the new object lifetime states could be annotated with {.lifetimeStates.}. Objects without this pragma behaves the same as before.

Alternatives

Incremental changes:

Araq commented 1 year ago

You gain credibility by not linking to sources that contain lies like

1 Generational references are faster than reference counting, ...

^ Based on naive, flawed benchmarks that use the most naive implementation of reference counting...

Araq commented 10 months ago

I don't understand what problem this solves and since then we got =wasMoved and =dup which might do what you wanted.