nim-lang / RFCs

A repository for your Nim proposals.
134 stars 26 forks source link

Nim's NRVO is C++'s inplace construction (placement new) #230

Open Araq opened 4 years ago

Araq commented 4 years ago

This RFC is titled after an observation by @zah. Before my patch https://github.com/nim-lang/Nim/pull/14390 Nim optimized this pattern


template newClone*[T: not ref](x: T): ref T =
  let res = new typeof(x)
  res[] = x
  res

proc r(): T {.raises: [E].} = ...

let x = newClone(r())

into good code without a temporary of T (which is a problem when T is large). It was translated into r(res[]) and NVRO ensured the heap location was directly written into. After my patch a hidden temporary of T is introduced by the code generator. This is strictly more conforming to the spec since the heap stores into res[] are observable and should not happen in case of r raising an exception.

I argue that the old behaviour was better as you can easily introduce a temporary on your own if you need it, but it's currently impossible to avoid it if the stores are observable and yet benign. The only problem is then that it's slightly more error-prone and so the compiler should produce a warning like

"Observable stores into res[] if r raises an exception, use a temporary to avoid them"

. This warning can then be disabled via .push warning[ObservableStores] = off to signal the NRVO is intentional for performance.

Benefits

Downsides

Implications

Alternatives considered

out parameters, pragmas like {.enforceNvro.}, {.inplaceConstruct.}, a new assignment operator like <-. These all look more complex than this proposal.

zevv commented 4 years ago

I think the error message as proposed is too cryptic and will raise questions from the Average User, as it uses terms not mentioned in the manual.

Araq commented 4 years ago

@zevv There will be a link to a short article describing how to avoid the warning.

Clyybber commented 4 years ago

We could instead use the DFA to enable safe NRVO in more cases, like when we do the call in a try, but don't read the result when an exception would be raised.

arnetheduck commented 4 years ago

To provide a counterpoint, this is an example of letting a minor performance optimization trump correctness.

In particular, it's extremely surprising behaviour that breaks with the most basic foundations of what a function invocation is - if this behaviour is wanted for whatever reason (in a case where the performance would matter), it's trivial to rewrite the function to take a var parameter to make the destructive behaviour explicit and clearly visible - bugs that will follow from this kind of approach will be hard to find in any larger codebase, and will worsen as more deterministic resource management is baked into the language - ie consider when a destructor has to roll back a partially updated instance due to a gotcha like this.

The compiler could already remove most cases of the copying when it can prove that the result is not observable - ie it can apply to optimization to where the instance is "fresh" - let x = xxx() etc - this is the same argument as the compiler being able to figure out RVO without introducing a specifically named magic variable (result) - the N in NRVO is not really needed with the right analysis in place. Other features like lent and sink indeed already mitigate the lack of RVO in most cases further diminishing the actual performance hit of missing an RVO and making the surprising behaviour even less motivated. Indeed, the allocate-on-heap case is a special case of more advanced memory management similar to "placement new" in C++ where construction is separated from memory allocation - ideally Nim would have something to express this kind of construction.

If anything, one could imagine a hint for when RVO is not applied could be useful - ie "missed optimization" warning - this would balance the language in favour of correctness, letting performance-conscious developers find the cases where RVO is not being used safe - this is certainly easier to teach, and would encourage and help discover places where the compiler could be improved.

Araq commented 4 years ago

Well ok. So assuming we want inplace construction, how to express it?

zah commented 4 years ago

Even without the correctness argument which I support as well, I'll try to convince you that both approaches with converge to the same code appearing in the user modules, with only a small difference in how the compiler helps you get there.

Let's assume the compiler produces a warning. It would be inconvenient for the user to deal with push warning: disable and pop, so she is likely to introduce a template doing the following:

template constructInPlace(location: var T, initExpression: T) =
  {.push warning[ObservableStores] = off.}
  try:  
    location = initExpression
  except Exception as e:
    reset(location)
    raise e
  {.pop.}

In newClone, it would be used like this:

template newClone*[T: not ref](x: T): ref T =
  let res = new typeof(x)
  constructInPlace(res[], x)
  res

To get there, the compiler told us that res[] = x has observable side-effects, which we silenced with constructInPlace.

The alternative solution is to make constructInPlace a magic. The manual will explain that it is the equivalent of the C++'s placement new operator. Now, before I apply it manually, the compiler is free to try to prove that using RVO is safe and it can still use it implicitly, but if I have reviewed the generated code and found out that the optimisation is not used somewhere where it would be appropriate, I can just explicitly request it with constructInPlace.

constructInPlace is concerned only about RVO. It assumes that the memory that will be passed in the init expression can have arbitrary state, so the init expression is responsible for zero/default initialization where that's required.

The only question is how do we call the constructInPlace operation? I think that name is fine, because it will be rarely used and it's easier to look up in the docs, but you can substitute it with any operator if that's desired:

template newClone*[T: not ref](initExpression: T): ref T =
  let res = new typeof(x)
  res =<= initExpression
  res
Araq commented 4 years ago

is likely to introduce a template doing the following

Yes, I know. So thanks to a warning the feature could be built as a library and not as yet-another language extension.

arnetheduck commented 4 years ago

instead of making it a warning, an option perhaps would be to make it an effect and a corresponding pragma to kill the effect, defaulting proc to not allow the effect to proliferate, akin to sideEffects and its {.noSideEffects.}: bla companion - it's ugly enough to scare off any normal user (as it should) - it's a stronger statement than a warning - library helpers could indeed be developed to soften the blow for specific placement situations like the GC heap or an arena memory manager.

warnings are entirely voluntary and can't be picked up by macros and generic code - effects could, making the construct more powerful while at the same time preventing dialects appearing in libraries that require the user to disable the right set of warnings to be tolerable.

arnetheduck commented 4 years ago

new f() btw seems like it could construct stuff on the gc heap in-place btw - any problems with this syntax?

Araq commented 4 years ago

How about this rule instead: Inside every proc body we ensure that stores to result cannot have happened after raising an exception. Then at the callsite we can always apply NVRO safely. It pushes the usage of temporary storage inside the proc and doesn't burden the caller.

Clyybber commented 4 years ago

But that would mean the callee has to (if it can raise) copy the result param at the start of the proc and restore it when an exception will be raised. So we will always have the copy at the start, setting us back to where we would be without RVO.

I think its best to try to enable RVO in more cases by trying to prove that the effect would not be observable, which is basically the same as lastRead analysis.

Araq commented 4 years ago

@Clyybber but assuming my latest rule the problem moves from callsite to the proc body which means we can deal with it without interprocedural analysis (just like lastRead analysis works inside a single proc body).

Clyybber commented 4 years ago

But then we always need to pessimisticly assume it may be observable, and such all optimization opportunities are gone.

zah commented 4 years ago

How about this rule instead: Inside every proc body we ensure that stores to result cannot have happened after raising an exception. Then at the callsite we can always apply NVRO safely. It pushes the usage of temporary storage inside the proc and doesn't burden the caller.

@Clyybber, what this rule suggests is that the practice of filling out results gradually will become more problematic (it will trigger a warning or error in the great deal of cases).

You can still avoid a lot of copies if the result is initialized in one go through a constructor expression or through another expression returning a complete value.

I think this is fine proposal as well, but it does come at a significant upgrade cost for older code relying on the result variable.

Varriount commented 4 years ago

I'm familiar with RVO (return value optimization) but what is NRVO?

zevv commented 4 years ago

I had to look that up as well :)

https://definedbehavior.blogspot.com/2011/08/value-semantics-nrvo.html

zah commented 4 years ago

@Araq pointed out in a private conversation that the discussion here can be re-framed as the following question:

1) Should the assignment operator provide the basic exception safety guarantee or the strong exception safety guarantee by default?

For background on these terms, see: https://www.boost.org/community/exception_safety.html

No matter what the default is, it should be possible to explicitly request the correct RVO treatment through templates. Above, I've demonstrated how one can define constructInPlace (enforced RVO). It's equally easy to define a template that avoids RVO (assuming that there is nothrow move operation):

template withoutRVO[T](expr: T): T =
  let temp = expr
  move(temp)

proc foo(s: seq[Bar]) =
  s[0] = withoutRVO makeBar()
arnetheduck commented 4 years ago

No matter what the default is

this is actually crucial in the discussion - it's always possible to circumvent constraints but what the default ends up being be dictates how people by and large will use the language - it takes an advanced super-programmer - say a compiler developer - to even understand the problem and or select the right template / library function.

GULPF commented 4 years ago

@Araq

@zevv There will be a link to a short article describing how to avoid the warning.

The message by the compiler in the latest stable release is now Warning: observable stores to 'x', which is even more cryptic than the original message suggested in this RFC. No link to an article is provided.

ghost commented 4 years ago

Is there any conclusion on whether this RFC will be implemented or not? And what about the warning message?

Araq commented 4 years ago

No conclusion yet. However, the warning message has been implemented and been found to be annoying.

jlokier commented 3 years ago

What about doing something actually like C++'s NRVO, which is for safe initialisation only. Make the rule: NRVO by default when assigning in a let x = ... or var x = ...; no NRVO for ordinary assignment without let or var. (A pragma could be used if someone really wants NRVO with an ordinary assignment.)

This has a number of things going for it:

RedBeard0531 commented 3 years ago

Is the example code at the top actually an example of the problem? As far as I can see there is no way to observe r's writes to res[] in that code because it goes out of scope immediately before anyone can observe it (unless you count dead stores to about-to-be-freed memory as observable, but that would be insane). Even if you RVO out to x, it also goes out of scope immediately.

Would it be possible to either add some more explanation for why that example is considered to leak observable stores, or to change the example to show some misbehaving code?