nim-lang / RFCs

A repository for your Nim proposals.
135 stars 26 forks source link

raises tracking should behave like nosideeffect tracking: effects through params should be allowed #403

Closed timotheecour closed 2 years ago

timotheecour commented 2 years ago

here's my thought regarding effectsDelayed (https://github.com/nim-lang/Nim/pull/18610): IMO what we should do instead is same as what we do for {.nosideeffect.}, where a func is allowed to have side effects so long they're only through parameters:

nosideeffect: effects are allowed through params:

when true:
  proc fn1 = echo "ok1" # sideeffect
  proc maybeCall(a: proc()) = a() # maybeCall can be assumed to call a
  func bar1(a: proc()) = a() # direct call => no func violation because all sideeffects happen through params
  func bar2(a: proc()) = maybeCall(a) # potentially indirect call vs direct call doesn't matter: same analysis as bar1
  bar1(fn1) # ok
  bar2(fn1) # ok

  when false:
    # this correctly gives: Error: 'bar3' can have side effects
    # the side-effect is not through a param => error
    func bar3() = maybeCall(fn1)

raises: effects should also be allowed through params:

when true:
  proc fn1 =
    if false: raise newException(ValueError, "")
  type A = proc() {.raises: [ValueError].}
  proc maybeCall(a: proc()) = a()
  proc bar1(a: proc()) {.raises: [].} = a()
  proc bar2(a: proc()) {.raises: [].} = maybeCall(a)
  bar1(fn1)
  bar2(fn1)

  var f: A = fn1 # ok

  when false:
    # this correctly gives: Error: fn1() can raise an unlisted exception: ref ValueError
    # the raises is not through a param => error
    proc bar3() {.raises: [].} =
      fn1()

so far so good, it behaves the same as nosideeffect.

The problem is the following; which is what this RFC is about:

  when false: # BUG!
    # this gives: Error: a() can raise an unlisted exception: ValueError
    # BUG: this should work, because the raise only happens through params
    # it doesn't make sense that we'd disallow this but now `bar1` which is more general
    proc bar4(a: A) {.raises: [].} = a()
    # proc bar5(a: A) {.raises: [].} = maybeCall(a) # same as bar4, the distinction indirect vs direct should be irrelevant

proposal

note

this is analog to how purity works in D, eg: https://forum.dlang.org/thread/blwlivyzhgwophxvhuye@forum.dlang.org

D's working definition of a pure function is "effect only depends on parameters". Mutating parameters does fit the definition. This is less tight than other languages' definition, but we believe it's a sweet spot between reaping the modularity benefits of purity, and benefiting of the advantages of mutation.

and related articles, eg: https://klickverbot.at/blog/2012/05/purity-in-d/

›Weak‹ Purity Allows for Stronger Guarantees

arnetheduck commented 2 years ago

proc bar1(a: proc()) {.raises: [].} = a()

this solves a different problem than raises generally, namely "does bar1 raise additional exceptions with respect to its parameters" - that's a weak, borderline useless guarantee because it doesn't help me understand if bar1 handles errors that a raised or not. It also doesn't help me simplify my code based on there not coming any exceptions from bar1, ever, which is the whole point of a raises annotation - given that a can raise anything, the full decaration declaration basically says "raises anything, in spite of raises: [] - the last bit is just noise.

Araq commented 2 years ago

I'm confused, https://github.com/nim-lang/Nim/issues/18376 clearly shows that the rules as they already exist are not ok and we need a language extension. This is from our test suite (teffects6.nim) (and from the spec too)


proc noRaise(x: proc()) {.raises: [].} =
  # unknown call that might raise anything, but valid:
  x()

proc doRaise() {.raises: [IoError].} =
  raise newException(IoError, "IO")

proc use*() =
  noRaise(doRaise)
  # Here the compiler inferes that EIO can be raised.

it already works as you propose it should work. And it's insuffient, see bug #18376

timotheecour commented 2 years ago

I'm confused, nim-lang/Nim#18376 clearly shows that the rules as they already exist are not ok and we need a language extension

it already works as you propose it should work. And it's insuffient, see bug #18376

@araq no it doesn't work as I proposed, see the detailed step-by-step reduction in https://github.com/nim-lang/Nim/issues/18376#issuecomment-890152456 and the final reduction:

when defined case7g:
  type Call1 = proc() # can raise anything, including ValueError
  type Call2 = proc() {.raises: [ValueError].} # can raise at most ValueError
  proc barCal(b1: Call1, b2: Call2) {.raises: [].} =
    b1() # ok
    b2() # BUG: compiler rejects with: Error: b2() can raise an unlisted exception: ValueError
timotheecour commented 2 years ago

@araq

Here's a variant of this RFC, which tracks effects through params via symbol references:

proc fn1(a: proc(), b: proc()) # inferred as {.raises: Any.}
proc fn2(a: proc(), b: proc()) {.raises: [].} # raises nothing
proc fn3(a: proc(), b: proc()) {.raises: [a].} # raises whatever param `a` raises, ie raises(a)
proc fn4(a: proc(), b: proc()) {.raises: [a, b, ValueError].} # raises raises(a) + raises(b) + ValueError
proc fn5(a: proc(), b: proc()) = # inferred as {.raises(a).}
  a()

benefits

proc main1(): auto = # infered as {.raises: [].} proc cmp(a,b): int = b-a # infered as {.raises: [].} sorted([1,2])

proc main2(): auto = # infered as {.raises: [ValueError].} because of raises: [cmp] proc cmp2(a, b: string): int = cmp(a.parseInt, b.parseInt) # can raise ValueError sorted(@["12", "3", "01"], cmp2)


* the analysis stays local: once a proc is semchecked, its effects are summarized by its infered raises list (which can include symbols)

## note
https://github.com/nim-lang/Nim/pull/14976 implements a similar thing, but in the case of raises this is actually a simpler analysis, more local (doesn't require interprocedural analysis, all info is summarized in the raises annotation).
Araq commented 2 years ago

no it doesn't work as I proposed, see the detailed step-by-step reduction in nim-lang/Nim#18376 (comment) and the final reduction

You are right, but it's only one aspect of the problem:


proc x(...; callback: proc()) {.raises: [].} =
  # if `add` raises nothing, so does this:
  container.add callback

Your later reply seems to acknowledge that and is a better proposal.

Araq commented 2 years ago

Here's a variant of this RFC, which tracks effects through params via symbol references: ...

Very good RFC.

arnetheduck commented 2 years ago

Here's a variant of this RFC, which tracks effects through params via symbol references:

lgtm - one more issue in nim in general is that the raises list of proc types are not actually inferred, but rather eagerly set as "anything" (at least in nim 1.2, haven't been following this discussion), even in the cases where they could be - this is probably not in the domain of this RFC directly, but will still affect its behavior - this becomes relevant for sort and cmp in particula:

proc fn3(a: proc(), b: proc()) {.raises: [a].} # always inferred as "raises anything" because `a` raises anything

proc sorted*(a: seq[T], cmp: proc(a, b: T): int): seq[T] = ...
  # infered as: {.raises: [cmp].} since cmp is called - which de facto means "raises anything"

Call operators solve this in C++ - I believe there have been similar requests for nim so that sort can inline cmp etc.

One should also be careful to ensure that the binding of a, b and so on handles name clashes between parameters and types.

timotheecour commented 2 years ago

proc fn3(a: proc(), b: proc()) {.raises: [a].} # always inferred as "raises anything" because a raises anything

no, this would depend on call site;

proc main() = # infered as raises: []
  proc a()=discard # infered as raises: []
  proc b()=discard
  fn3(a, b) # infered as raises: [] because fn3 is `raises: [a]` where a is raises: []`
proc main2() = # infered as raises: [ValueError]
  proc a()=raise newException(ValueError, "") # # infered as raises: [ValueError]
  proc b()=discard
  fn3(a, b) # infered as raises: [ValueError]

Call operators solve this in C++ - I believe there have been similar requests for nim so that sort can inline cmp etc.

https://github.com/nim-lang/Nim/pull/11992 would enable that among many other things but I dont' see how call operators or inlining relates to raises

One should also be careful to ensure that the binding of a, b and so on handles name clashes between parameters and types.

unlikely to clash bc params are typically lowercase and exceptions types are typically uppercase, but even then usual scoping rules apply and can be disambiguated with fully qualified names

# from customexceptions import Foo1
proc a(Foo1: proc(), Foo2: proc()) {.raises: [Foo2, customexceptions.Foo1]
arnetheduck commented 2 years ago

no, this would depend on call site;

well, it doesn't, currently. when you type a: proc () for an argument, it is always defaulted as raising anything regardless of what happens at the call site- the actual effect is not deduced at bind time - this behavior in general is an orthogonal part of the language and thus out of scope for this RFC since it affects more than just function effect propagation.

var x: proc() {.raises: [ValueError].}
x = proc() = echo 5 # ok - x still set as raising `ValueError`

proc a(b: proc()) = 
  x = b # not ok, b assumed to anything unless explicitly specified

if b were to become deduced with regards to effects, and we have the ability to discover the effects of b, that would effectively mean that conceptually, multiple overloads of a must be generated since one might want to differentiate code based on whether b raises or not.

timotheecour commented 2 years ago

well, it doesn't, currently

it does, see below

when you type a: proc () for an argument, it is always defaulted as raising anything regardless of what happens at the call site- the actual effect is not deduced at bind time

this is false:

proc fn1() = discard
proc fn2() = raise newException(ValueError, "")
proc bar(fn: proc()) = discard # depends on call site
proc main() {.raises: [].} =
  bar(fn1) # ok
  # bar(fn2) # would give CT error

Your example doesn't negate that; a is semchecked once and must assume pessimistically that b can raise anything, so x = b in the body fails compilation.

Araq commented 2 years ago

Please make https://github.com/nim-lang/RFCs/issues/403#issuecomment-890162087 it's own RFC or patch this RFC to make this the top comment.

timotheecour commented 2 years ago

done, see https://github.com/nim-lang/RFCs/issues/408

arnetheduck commented 2 years ago
proc fn1() = discard
proc fn2() = raise newException(ValueError, "")
proc bar(fn: proc()) = discard # depends on call site
proc main() {.raises: [].} =
  bar(fn1) # ok
  # bar(fn2) # would give CT error

in this example, the fact that bar(fn2) gives a CT error is the error that any improvement to exception tracking should remove - bar should never be tracked as raising anything because it cannot de facto raise (since it doesn't directly or indirectly call anything that raises).

the pessimistic analysis is necessary, or the compiler must generate a new overload for every variation of the raises effects because the behaviour of bar, should it call its argument, will depend on whether the argument raises or not - this so far has been beyond what the language does.

timotheecour commented 2 years ago

I intended to write proc bar(fn: proc()) = fn() here's the code applying this correction:

proc fn1() = discard
proc fn2() = raise newException(ValueError, "")
proc bar(fn: proc()) = fn() # depends on call site
proc main() {.raises: [].} =
  bar(fn1) # ok
  # bar(fn2) # would give CT error