nim-lang / RFCs

A repository for your Nim proposals.
136 stars 26 forks source link

User-defined implicit initialization hooks #252

Closed haxscramper closed 1 year ago

haxscramper commented 3 years ago

User-defined implicit initialization

This RFC mostly reiterates ideas from #48, #126, #233

Add support for user-defined implicit initialization hook with following prototype:

proc `=init`(x: var T)

Is this needed?

Existing proposals

There has been several RFCs related to default initialization/implicit construction for user-defined types.

Existing compiler warnings

Nim compiler already provides two warnings directly related to default initialization, three more related to initialization in general, making total of five initalization-related diagnostics, meaning there is at least some interest in correct initialization behavior

{.requiresinit.}

Separate pragma {.requiresinit.} to completely prevent implicit default initialization. Used really infrequently (only 126 times in 1340 packages - approximately 90% of packages I checked haven't used it even once)

It is not possible to contain effects of requiresinit - once added it affects all code that uses type with annotated fields. It also affects templates that rely on type Res = typeof((var it {.inject.}; op)) to determine type of expression (right now almost none of the *It templates can deal with these types).

import sequtils

type
  Req = object
    f {.requiresinit.}: int

  Nreq = object
    f: int

template test(templ, expr): untyped =
  echo compiles(@[Nreq()].templ(expr)), " ", compiles(@[Req()].templ(expr))

test mapIt, 0
test filterIt, true
test allIt, true
test anyIt, true
true false
true false
true false
true false

Why this is needed?

Broken type system

As mentioned in these comments by @timotheecour large portion of type safety guarantees is invalidated - enum with offset, ranges now can't really guarantee anything unless explicitly created with initT. Any kind of value that has non-zero default requires special attention - it is now your responsibility to make sure this -1-as-default-value is actually used. {.requiresinit.} is a solution, but has already mentioned it propagates through whole codebase, requiring far-reaching modifications.

NOTE: I personally think that {.requiresinit.} is a great way to explicitly declare requirements and enforce them via compiler diagnostics. The only drawback is that it is really viral and has to be worked around in some cases (typeof pattern can just be written as var tmp: ref InType; var it {.inject.} = tmp[]; op).

`=destroy` confusion

It is possible to have specific destruction hook, bound to particular type and you can write initT proc for user-defined constructor, but when it comes to default initialization everything is just filled with zero and that's it. It is also possible to completely forbid implicit initialization, but not configure it. I find it rather confusing and counter-intuitive.

Large number popular imperative/OOP programming languages provide way to customize default values. Out of all languages mentioned in nim for X programmers on wiki only C lacks this feature.

Other concerns

RFC #126 (Support default values for object properties) suggests implementing default value initialization in form of

type MyObj = object
    id: int = generateUniqueId()
    x: int = 42
    stuff: seq[string] = @[]

Which can be implemented using macro (see forum thread) and it is not necessary to add this into language core. If one wishes they can use macro to automatically declare `=init` hook. It is already possible to do for explicit initialization initT procs, but default initialization is not currently configurable.

Possible implementation behavior

Similar to how `=destroy` is handled

var x: T; stmts
------------- (default variable initalization)
var x: T
`=init`(x)
stmts

If type does not have user-defined `=init` then no injection shall happen. If any of the fields have initialization declared then default initialization in form of

proc `=init`(obj: var T) =
  `=init`(obj.fieldWithInit)

is implicitly declared recursively. If field is has type range or enum for which low(Enum).int != 0 or low(range[..]) != 0 then `=init` is implicitly declared too.

Object construction syntax. If field is not initialized by user explicitly and field type has `=init` declared field should be implicitly initialized. If forced explicit initialization is necessary then {.requiresinit.} can be used on object field.

let obj = Object()
------------- (default field initalization)
let obj = block:
  var obj = Object()
  `=init`(obj.uninitializedFieldWithInit)
  obj

NOTE: {.requiresinit.} already uses similar logic - if type field cannot be default-initalized then none of the object containing file of this type can be default-initialized too.

haxscramper commented 3 years ago

This is not an addition to RFC - just some ideas that might potentially be useful.

It is not uncommon to see procedure implementation pattern where result is not explicitly initialized and immediately used to append, set field value etc. It is fine most of the time, but when type definition switches to ref (e.g. it was just Type = object and now it is Type = ref object) this can lead to annoying debugging where you have to figure out all places where implicit initialization happened. This is a rare use case but happens sometimes. `=init` could potentially make this a non-issue and further diminish distinction between ref and non-ref types, which is in line with already supported (experimental) automatic dereferencing.

If ref variable really has to be nil it might be better to explicitly initialize it as = nil, otherwise treat it as regular variable and use implicit initialization hook.

More on 'broken type system' - object that have non-trivial initial state (e.g. not just zero-filled memory) are more fragile in cases where implicit initialization is not configurable - you must take care and use dedicated constructors all the time, even in situations like var obj: Obj.

Another (mostly theoretical) idea is that it might be possible to automatically add finalizers for ref objects if they are created using `=init` regardless of GC algorithm used. Something like

proc `=init`(v: ref var T) = new(v, final)
Varriount commented 3 years ago

How would exceptions be handled?

haxscramper commented 3 years ago

If you mean exceptions in the `=init` hook the answer is - I don't think any specific handing is necessary, since value initialization should happen in the same scope as object construction, immediately after var declaration, which means we either get correctly initialized object (if not exception is raised) or exception is raised - no half-initalized objects if you mean this.

Although I'm not sure if I understand what exact scenario you have in mind - if you could elaborate on your question I might provide better answer if possible.

mratsim commented 3 years ago

Just like we require =destroy to be plain object, we can enforce =init to not throw. And in C++ AFAIK it's undefined behavior to through in a constructors.

proc `=init`(v: ref var T): {.raises: [].]
Araq commented 3 years ago

And in C++ AFAIK it's undefined behavior to through in a constructors.

Pretty sure it's supported and partially constructed objects are deconstructed properly. Looks super expensive to implement (like everything else in C++ I guess).

Araq commented 3 years ago

-1 from me. First of all construction is very different from destruction, constructors take parameters in most languages and the problem is worse when "size hints" optimizations enter the picture: A size hint should be attached to an object, not to an object's type.

Furthermore the mechanism will soon be misused to avoid the initT, newT idiom even though it's strictly less flexible than custom constructor procs, see the "factory" pattern and how C++ got make_shared and make_unique even though C++ does have very good support for constructors, there is a lesson to be learned here.

The route forward IMHO is to allow default values inside object declarations with the restriction that the value has to be a compile-time value. For multiple reasons:

haxscramper commented 3 years ago

Main point is - with constexpr as default values there is no way to execute code when implicit initialization happens. Yes, in overwhelming majority of use cases constexpr is more than enough, but this route completely closes way for non-trivial logic in implicit initialization which might be necessary in some cases.

It is possible to place additional restrictions on =init procedure prototype, such as .raises[] and .noSideEffect., although latter one makes =init almost indistinguishable from constexpr fields.

even though it's strictly less flexible than custom constructor procs

I would argue that =init being explicitly less flexible is a good thing since it prevents misuse.

mechanism will soon be misused to avoid the initT, newT idiom

Again - since there is no support for parameters in =init I don't see how it would affect existing idioms in most cases. In addition -initT just looks better, more logical and often used. I don't think =init will "soon" be misused to avoid initT.

First of all construction is very different from destruction, constructors take parameters in most languages

Again - this is not about explicit constructors - we already have them (initT and newT) and they fit quite nicely into the language. This is only about being able to configure implicit object instantiation.

It prevents id: int = generateUniqueId(), which is spooky action at a distance. Side effects should not be hidden.

I'm sorry, but I don't follow how this would prevent it. If you mean let id: int = ... then no init call should be generated in this case since explicit initalization happened. In case of unique id for each object instance - this would only help, since

type
  Obj = object
    id: int

proc initObj(): Obj = Obj(id: generateUniqueId())

proc `=init`(obj: var Obj) = obj = initObj()

Allows include Obj in different structures and not worry about correct implicit initialization of all subfields.

If this is a 'misuse' you were talking about - I think it is necessary to have some way to configure this behavior and cut chain of "if A includes B I must initialize A it correctly using initB" which basically stretches from initial type to infinity now. With initT responsibility for correct initialization is pushed to all potential users of a type, again and again, potentially breaking adjacent layers of abstraction (each user of Obj must be aware that it is important to construct it using initT and is responsible to making this knowledge available to next abstraction layer (via documentation or .requiresinit.)). constexpr default values partially mitigate this issue, but by definition (compile-time evaluation) they fail to address cases with generateUniqueId().

This is basically the same as .requiresinit. - yes, it is possible to use, I would argue it is a great tool even, but you are creating responsibility for all potential users.

This problem is quite nicely illustrated by mapIt's inability to deal with .requiresinit. types - even though it uses it to get expression type it is still not possible. Using initT is not possible because it would require passing constructor proc to parameter everywhere necessary. Constexpr types could solve the issue, but as already mentioned this is too restrictive solution.

Araq commented 3 years ago

Again - since there is no support for parameters in =init I don't see how it would affect existing idioms in most cases. In addition -initT just looks better, more logical and often used. I don't think =init will "soon" be misused to avoid initT.

Ok, this wasn't clear to me before, thanks!

But then your proposal is mostly a different syntax for field = defaultValue (my context here is an object declaration). Syntax aside, there is one difference, you seek to allow arbitary expressions, I really want to restrict it to constant expressions. If we start with the restrictive version, we can always make it less strict in later versions. The same is not so easy for the reverse case: Allow everything and soon enough somebody will rely on it.

This problem is quite nicely illustrated by mapIt's inability to deal with .requiresinit. types - even though it uses it to get expression type it is still not possible.

I think that's a problem that can be solved by special casing typeof even further.

haxscramper commented 3 years ago

Yes, exactly. I think that arbitrary expressions might be necessary in some cases, but I agree that it is not possible to make things less strict so starting with constexpr and potentially expanding into =init is a good solution.

mapIt is is relatively easy to work around - you can just use (not really pretty though) hack - typeof((var tmp: ref InType; var it {.inject.} = tmp[]; op)) which is not valid at runtime but works fine in most cases.

Araq commented 3 years ago

So can we agree on supporting it in this way:


type
  StartWith1 = object
    x: int = 1

?

haxscramper commented 3 years ago

Yes. It covers main concerns about type guarantees invalidation (which is really important) and other complex cases of default initialization would be nice to support, but not right now at least.

timotheecour commented 3 years ago

@araq it's not entirely clear what proposal led to "Accepted RFC", is it the following:

let a3 = 3
type 
  Foo = object
    x1: int = 1 # ok
    x2 = 2 # ok, type inference allowed in initializer
    x3: int = a3 # CT error, field initializer must be const 

? if so, then +1

note 1:

it would currently prevent initializers that are ref/ptr/pointer:

type Bar = ref object
  b0: int

type Foo = object
  b: Bar(b0: 1) # error: initializer is a ref and can't be const

EDIT: this restriction could be lifted by allowing const ref objects, by accepting https://github.com/nim-lang/Nim/pull/15528 (see also https://github.com/nim-lang/RFCs/issues/126#issuecomment-616135556)

note 2:

this caveat applies:

type Foo = object
  x1: cstring = "abc"
var a = Foo()
a.x1[0] = 'A' # SIGBUG

[EDIT] note 3

see https://github.com/nim-lang/RFCs/issues/126#issuecomment-616135556 for a more detailed proposal that also covers:

`var a: T` # always equivalent to `var a = default(T)`
# `default(T)` is defined recursively in the obvious way, taking into account default intializers for object types, eg: see example provided there
Araq commented 3 years ago

it's not entirely clear what proposal led to "Accepted RFC", is it the following: ...

Yes.

this caveat applies:

well var a = Foo(x1: "abc") has the same problem, nothing changed.

pmetras commented 3 years ago

The problem I see with @Araq syntax

type
  StartWith1 = object
    x: int = 1

is that it works only for object initialization. You can't use it with other types like

type
  Ranged = range[10 .. 20]    # I would like to have 10 as default
  BoolTrueDefault = bool      # This type of bool should default to 'true'
  Constraint[T] =
    c: T                      # When implementation is delegated to another client module,
                              # default initialization should be too.
haxscramper commented 3 years ago

First two types here are not distinct so they should follow regular initialization logic (e.g. false for bool and 0 for range[10 .. 20]). Although range example shows how easily it is to break all guarantees with zeroed default values. (var r: range[10 .. 20]; echo r gives 0).

Default initialization of distinct types is also an important case to consider, but I just can't see how this can be added in type definition syntax. In objects value for fld: type = val was just explicitly prohibited so it is easy to just relax the syntax checking, but for distinct types there is just no place. So for cases like type Hp = distinct range[0 .. 100] you need to have `=init`(hp: var Hp) = hp = Hp(100) or something.

Constraint[T] is just a chain of responsibility "who needs to initialize what" and I don't think it can be solved without =init or constructor procs.

timotheecour commented 3 years ago

The problem I see with @Araq syntax Default initialization of distinct types is also an important case to consider, but I just can't see how this can be added in type definition syntax.

metagn commented 2 years ago

construction is very different from destruction

In my opinion it makes sense if the goal is to prevent invalid states. Destruction turns a value from a valid state to an invalid state, initialization turns it from an invalid state to a valid state. Optimizations like noinit would prevent a call to =init. Maybe there's a way to turn "zero values" into a compile time construct, while turning "runtime memory zeroing" into a type-bound operation. JS codegen already special cases "runtime memory zeroing" for each mappable type. Not sure how these constructs would interact.

ajusa commented 2 years ago

Would it also be possible for this RFC to support tuples? I didn't see an example using them yet:

type
  StartWith1 = tuple
    x: int = 1
  StartWith2 = tuple[y: string = "2"]
konsumlamm commented 2 years ago

Would it also be possible for this RFC to support tuples?

Tuples are different than objects in that every tuple with the same fields is the same type. So defining a custom initialization hook for a tuple type would affect all tuples of that type, no matter where they're defined. Your example seems more related to https://github.com/nim-lang/RFCs/issues/126, as it doesn't involve defining a =init hook.