Open Araq opened 1 year ago
It totally sounds like the n-word, what about NIR ?. 🤔
It will be like a file written to disk or sqlite or just an object in memory?.
There are only case statements, no if statements. A backend can easily convert a case statement that only has one or two branches into an if statement.
So it has to convert if
into case
and then back to if
again ?.
So it has to convert if into case and then back to if again?
Yes.
@Araq: please comment on "It totally sounds like the n-word, what about NIR ?" as well.
Shouldn't it read more like "anger" then? Just add a 'b' if you're concerned: "Nim goto-based intermediate representation": "Angie-Beer".
Well, the fact I'm more inclined to discuss the name shows how qualified I am to discuss the matter of the RFC.
@Araq: please comment on "It totally sounds like the n-word, what about NIR ?" as well.
En-gee-I-AR, doesn't sound like the n-word at all to me and I made no such connection when I made it up.
In a world where people pronounce JSON "J-son" or GUI "Gooey", perhaps it's not pronounced as it's abbreviation.
I agree the acronym is unfortunate. The fact that it has already been a distraction in this RFC shows that it is likely to continue to be a distraction.
Is there a problem with NIR?
En-gee-I-AR, doesn't sound like the n-word at all to me
This is similar to "FCKR: eff-see-kay-arr - doesn't sound like the f-word".
First, it's not all about sound: visual similarity also matters.
Second, acronyms in English can be pronounced without spelling out the letters. For example:
It totally sounds like the n-word, what about NIR ?. 🤔
I thought I was the only one who saw it 😁 I thought Araq must have called it Nim Intermediate Goto Representation but that would be a horrible acronym so he changed it slightly.
Personally, I don't think it's a problem. If anyone chooses to see slurs where there's none, it's their personal issue.
However, if something must be done here, I suggest GIR. Python's GIL isn't called PGIL, it's just GIL. So, Nim's GIR can just be GIR 🤷
For the sake of moving forward I renamed it to GIRN but this is ridiculous.
Even if it seems ridiculous, which if we're going to pick sides regarding how to feel about the naming, I would err on - amending the acronym is and was the safe bet. People are way too sensitive and polarized around identity at the moment. We've also been down this road as a community before and also for some of us as individuals.
Thanks for taking quick action on this. Also, a side note for everyone reacting to the original acronym: even if we find something offensive personally, that doesn't imply that the person you take offense from was acting with any specific intent or malice. Remember we are all human beings before anything else and we're all deserving of one another's respect.
Also, kudos on the compiler factoring. It's always nice to read about complexity being reduced.
Given assumeHasValue
has hasValue sym, val
been considered? I think it'd allow value narrowing enabling more compile time checks and evaluations.
Given assumeHasValue has hasValue sym, val been considered? I think it'd allow value narrowing enabling more compile time checks and evaluations.
That's already covered via nkAsgn x, value
.
How to model try finally
Exceptions in llvm are modeled by making function call act as a conditional, essentially, with two return points, pretty much like an if statement: one for the normal case and one for the exceptional case - modelling it this way is similar to the boolean "if err" after each function call, but I suspect dedicating a call type (llvm calls it invoke
) for exception handling is more compact and leads to less information loss.
One side effect of this strategy is that finally
code "blocks" look like they get duplicated in both the happy and unhappy case - this is "more or less" what finally means, that you run the same code in both branches: while on the surface this may seem like duplication, it is not - the context in which finally
runs during exception handling is different from that the "happy" case exit and different rules apply (ie different things have happened leading up to that point, including which variables are live and the state of globals and other observable side effects) - in particular, you will need to decide on the semantics of raising inside finally (or inside destructors), which is fraught with issues as the many bugs in Nim in this area show.
can still be translated
translations like these lead to information loss - while some of that information can be recovered with analysis, such analysis slows down the compiler and still sometimes recreates imperfect information - this is something to consider in general: the semantics, and in particular the constraints, of the language offer opportunities that easily get lost in translation ("this is a copy-on-write construct, so read-only copies are cheap") - we can mitigate this to some extent, but like the try/finally most clearly shows, there will be other cases that are more subtle.
One side effect of this strategy is that finally code "blocks" look like they get duplicated in both the happy and unhappy case
That's one thing I wanted to avoid, yes, but the question remains how to model it so that it can easily be recovered for JS/C++ codegen.
One side effect of this strategy is that finally code "blocks" look like they get duplicated in both the happy and unhappy case
That's one thing I wanted to avoid, yes, but the question remains how to model it so that it can easily be recovered for JS/C++ codegen.
So this wouldn't be as simple as having try
tags or whatnot? I'm guessing it'd require going through multiple blocks to reconstruct? What about a "try(blocks: a, b, d)" meta tag? Though I haven't had a chance to read through the details. Stil this goto form seems pretty nifty!
One question I had -- does this give more opportunities to do static analysis or would that primarily remain in the AST level?
P.S. thanks for the name change. It's ridiculous but just avoids a whole potential can of worms given things in the USA, and was like the second thing that came to mind.
Maybe it works with more special kinds of labels:
try:
f(a)
finally:
echo "done"
becomes (roughly):
TryLabel Lunused
f(a)
case raised
of true: goto L1
FinallyLabel L1
echo "done"
goto ProcEnd
label ProcEnd
That seems plausible? It'd at least keep the information in the IR. Though, would it require backends to have to handle the TryLabel, FinallyLabel types or would they just be metadata?
The backends need to handle them but it's a weird question, they exist for the backends.
The backends need to handle them but it's a weird question, they exist for the backends.
speaking of labels, LLVM has a similar mechanism (IR metadata) which basically allows attaching any kind of metadata to any IR node - it is used for transporting pass-specific metadata to later stages, for example type information to the type-based alias analysis pass - language frontends that are statically typed can "opt in" to generate the type data - similarly, extended control-flow information or "known ranges of values" could be propagated across passes which would alleviate the need to recover them via costly analysis but still allow analysis to enhance the IR along the way.
Having a "generic" metadata passing capability is something nlvm
certainly would benefit from (so that it can attach backend-specific data to nodes) - here's how I imagine it would work:
Why only "requested"? TBAA is a good example: if you're compiling without optimizations, you don't want to waste time on generating and storing type information which later will be ignored.
I've been following the OCaml Effect handler's for a while and something about GIRN & goto's tickles my brain as having possible overlap. Perhaps it's just the idea that with the OCaml style effects non-local control flow, exceptions, async, et al are just instances of the same underlying mechanism.
Would there be a way to do something similar with the goto's and perhaps a set of metadata for non-local jumps? It might be possible to re-use it for analyzing closure iterator's used for async as well. Perhaps not for runtime composable effects, but as an underlying coherent schema for non-local control?
I've read the paper and it doesn't really apply. It uses a new runtime and a type (or rather effect) system extension. GIRN is much simpler (to implement).
GIRN is also not a CPS representation as for that you need the notion of a parametrized goto instruction like goto fn(a, b, c)
.
GIRN is also not a CPS representation as for that you need the notion of a parametrized goto instruction like goto fn(a, b, c).
Ah makes sense that GIRN wouldn't encode the CPS-level representation. Just more of a hopeful thought than anything too serious -- though it was interesting to consider how CSP-level info could be used to provide non-exception effect handlers for handling things like overflows.
On another note, it'd be interesting how the kill
, assume
, require
could be used in a SPARK like system for formal verifiers or runtime verifiers. Maybe DrNim would be more feasible? Though I imagine the kill
and them might oriented more towards lifetimes.
Also, it's completely outside of my current wheelhouse so I'll let y'all get back to practical compiler work. :-)
Note: don't mix CPS (continuation-passing style) and CSP (communicating sequential processes)
Eh, thanks I'm guilty -- fixed my post.
NIR is GIRN that was NGIR :black_cat: ?
Abstract
I outline a new IR for the Nim compiler that aims to reduce the compiler's overall complexity.
Motivation
Nim's IR is based on its AST: While the AST served us well in the past, it's now clear that many of Nim's features like "definite assignment analysis", "nil checking" or its "move analyser" would be easier with a different program representation. GIRN is that different representation.
Description
I called it the "Goto intermediate representation for Nim (GIRN)". The type system is left untouched and everything is strictly typed. GIRN focuses on a control flow abstraction. Function calls can remain nested. Compared to the AST GIRN contains these major changes:
A proc body is a single list of instructions and control flow is mapped to labels and
goto
. Thegoto
instruction is designed/handicapped to keep the control flow graphs reducible so that no complex fixpoint algorithms are required.Additional computed information is stored alongside the instructions:
kill x
. Indicates that the scope ofx
has ended and thatx
will not be used afterwards.assume cond
. At this point we know thatcond
holds.require cond
. At this point we need to prove thatcond
holds.assumeHasValue x
. After this pointx
will have a value (but it's not specified which one).Loop labels and
goto loopLabel
are distinct from labels which are the targets of forward jumps.There are only
case
statements, noif
statements. A backend can easily convert a case statement that only has one or two branches into anif
statement.GIRN has the following properties:
The JS backend would benefit from reliable frontend inlining and once we have it we can properly distinguish between
.inline
which should bealways inline
and.hot
and.cold
procs.Code Examples
The examples here focus on the "move analyser". The move analyser does both control flow and data flow analysis.
The data flow aspect needs
assumeHasValue
as a summary instruction in order to be effective:Loops are interesting: If the loop "comes after" the definition of the variable it is an "inner loop" and we have to watch out:
Is translated into:
It is reasonably easy to see that
use x
is not the last use.Compare that with:
Which is translated to:
Again, it is easy to see what is really going on.
kill
instructions model the scopes effectively and also help with destructor injections.Backwards Compatibility
There are no incompatibilities as it only affects the passes after macro expansions.
Open problem
How to model
try finally
so that it can still be translated totry
for the JavaScript and C++ backends. Aside from that exceptions are simple to model via an expliciton error: goto errorHandler
instruction after a function call that can throw.