[Discussion] Compilation, intermediary functions, statements, expressions, etc

Opening this issue so that we can share our ideas here, may help with figuring out the best strategy.

My understanding (and correct me if I'm wrong) is that a lot (it not all) the intermediary functions introduced inside the body of a compiled defun are compiled let forms. And since most functions end being async, this also introduces a lot of asyncs once those functions have been introduced.

If we have an expression, most of the time it can be converted into an equivalent statement, with its last expression becoming a return or an assignment.

When some piece of code can be compiled into a Javascript expression, then it is better to do so because it results in a smaller code size, statements would be used when it is not possible to produce an expression.

What comes ahead is an incomplete analysis, and there are probably many holes in my undestanding, so please point those out.

`let` expressions

Since lets are expressions, using blocks requires inserting a return to the expresion that produces the value of the let expression.

Example:

(let X 1 (+ X X))

Gets compiled to:

(X => $.asNumber(X) + $.asNumber(X))(1)

With blocks and statements the equivalent would be:

{ const X = 1;
  return $.asNumber(X) + $.asNumber(X); }

But this only works when the result of the let expression is also the result of the defun. If the let expression shows up someplace where its result will be bound to a variable or used as an argument to a function call, then a return cannot be used. What can be done instead, in the case of a variable bind, is to declare the variable as let Var; (instead of const), and then introduce the block as before, but with an assigment to Var instead of a return.

Example:

Source:

(let X (let Y 1 Y) (+ X X))

Current:

(X => $.asNumber(X) + $.asNumber(X))((Y => Y)(1))

With statement

{ let X;
  { const Y = 1;
    X = Y; }
  return $.asNumber(X) + $.asNumber(X); }

The compiler should include this information in the context, so that when the let is compiled it knows what to do (either return or variable assignment). Since expressions are compiled "from the outside to the inside", annotating the context with this information shouldn't be hard.

The situation where the result of the let is used as a function call parameter is more complicated: (+ (let X 1 X) 2).

If the expression doesn't have side-effects, then it can just be assigned to a gen-symed variable using the method described above, and then that variable used as an argument.

But if there are side effects, this will not work because by doing this transformation naively we are changing the order of execution (Kl should be evaluated left-to-right).

A solution in such cases is to perform a conversion to A-normal form, so that every argument passed to the function call is bound to a variable first, then the variable passed (the transformation is easy to do, I wrote code to do this long ago because it was required for Chibi because it evaluates right-to-left, and also an OCaml port I was working on, this conversion can be done to the Kl code before the compiler even sees it).

It also has a cascading effect, every time this conversion is done, the function call that was an expression becomes wrapped in a let, which gets compiled into a statement. And unless side-effectful code is tracked, this conversion would be required every time a let shows up as an argument to a function.

The good news is that such lets are very rare, so all this would only happen in very few cases, if at all.

Example conversion:

(+ (let A (get-A)
        B (get-B)
     (some-func B A))
   10)

becomes:

(let Dn664 (let A (get-A)
                B (get-B)
             (some-func B A))
  (+ Dn664 10))

`if` expressions

if expressions are compiled into Javascript's ternary operator, which work quite well because it is an expression. But statements cannot show up inside such expressions, which is a problem now once let expressions get compiled into statements.

In such cases ifs can be compiled into Javascript's ifs, and the same rules as with let can be used to handle the results.

For example (basic case, this one wouldn't require any of this, but once compiled this way then statements can be part of if branches):

Source:

(let X (if (> 2 1) 2 1) (+ X X))

Current:

(X => $.asNumber(X) + $.asNumber(X))(2 > 1 ? 2 : 1)

Proposed:

{ let X;
  if (2 > 1) {
      X = 2;
  } else {
      X = 1;
  }
  return $.asNumber(X) + $.asNumber(X); }

`trap-error` expressions

I see these are wrapped in a function too, but for a try/catch same rules as above would apply.

And throwing is done with $.raise, so throwing an exception doesn't convert an expression in a statement like using throw would.

Factorization

If intermediary functions are removed, the constructs introduced by the factorise-defun extension can be compiled trivially into labelled blocks and labelled breaks. Since the constructs are new and separate there is no ambiguity and the transformation is direct (none of the analysis required for the above cases is required here, and at the same time, the above cased cannot be confused with these constructs).

The translation is described here: https://github.com/rkoeninger/ShenScript/issues/13#issuecomment-565065693

Since the conversion done by factorise-defun already adds %%return expressions, the Kl->Js compiled will not have to figure out where to insert returns.

Sync/Async

This part is something I'm not very clear on. Is it important for the async stuff to be implicit? Is it possible to instead use sync versions of the primitives that Shen requires, and then provide a new construct to make async calls explicit? if yes, what would the drawbacks be?

I'm adding libuv support on Shen/Scheme, so I'm interested in this not just because of ShenScript, and it is something I haven't given much tought to.

Code size

The statement versions require more bytes than the expressions, but I don't think the difference is THAT big, and if it results in considerably faster code, I think it is worth it.

Many calls that are repeated use functions that are properties of $, I think that with some analysis some can be removed (at least when typeckecking and the optimise flag are enabled).

Also I think expressions like (+ A B) can be compiled into $.add(A, B) instead of $.asNumber(A) + $.asNumber(B).

For reference, the previous attempt to use statement-oriented syntax is document here. The implementation used a similar approach but the result was far from perfect - there were a lot of unnecessary intermediate variables and block scopes. I'm not sure how of a performance impact that would have had.

Also I think expressions like (+ A B) can be compiled into $.add(A, B) instead of $.asNumber(A) + $.asNumber(B).

The build functions also use type-tracking so if A and B were known to evaluate to numeric types, the rendered code would just be A + B. This is described in the docs.

Previous experiments showed calling the as* functions had trivial impact on performance, so I would remove the type-tracking and just put as* calls everywhere before removing the type checks.

Is it important for the async stuff to be implicit? Is it possible to instead use sync versions of the primitives that Shen requires, and then provide a new construct to make async calls explicit? if yes, what would the drawbacks be?

If you had to use special syntax, function, macro when using I/O in ShenScript, that would mean that any Shen code using I/O wouldn't be portable. That's quite a lot of the code since f_error and other stuff in the kernel calls I/O functions and f_error gets inserted into any unexhaustive functions.

There's some attempt to cut down on the amount of async-ing and await-ing with this fixed list of known non-async functions, but it's limited and hard to extend since functions and be re-defined to start using I/O even if they originally didn't.

https://github.com/rkoeninger/ShenScript/blob/0e990504139b08d2a44816ead38796d49619b3a7/lib/backend.js#L184-L196

There's a docs section about this too.

If you had to use special syntax, function, macro when using I/O in ShenScript, that would mean that any Shen code using I/O wouldn't be portable.

But can't those Shen functions just use the blocking I/O versions of the underlying platform? this wouldn't require any special syntax, just the regular one, and those calls would behave as expected without incurring any overhead.

This would not prevent the use of the underlying async functionality, it would just not be done through Shen's primitives. If you wanted to for example write a Web server in Shen, you would not use those primitives anyway, or even worry about portability much, because the way it will be written to perform and behave well will depend a lot on the characteristics of the underlying platform.

Will go through those links latter, just wanted to ask about this but this is the part I'm a bit confused about.

For reference, the previous attempt to use statement-oriented syntax is document here. The implementation used a similar approach but the result was far from perfect - there were a lot of unnecessary intermediate variables and block scopes. I'm not sure how of a performance impact that would have had.

Ok, from the description, Fabrs look very much like what I have in mind, so back then you were already way ahead of where I am now.

I understand that the resulting code will be bigger because statements require more bytes than expressions.

What I don't understand is what kind of runtime overhead statements could have. Lets say the Javascript compiler is really good at optimizing IIFEs, what it will produce is equivalent to the statements version, right? it will not be faster.

Did those benchmarks include compilation time? Thats where I can see some overhead happening.

The build functions also use type-tracking so if A and B were known to evaluate to numeric types, the rendered code would just be A + B. This is described in the docs.

Ok, thats great. I saw that there was type tracking code, but for the functions I tried the wrapping functions were still present, but now that I think about it it was obvious why (the variables where parameters to the functions, not results of any operation).

But can't those Shen functions just use the blocking I/O versions of the underlying platform?

I didn't want to design around special cases where you happen to be able to do sync I/O for a particular operation on a particular platform. async/await allows for a transparent merging of styles on all modern JS engines.

The idea was that a function like load would behave ostensibly the same way in ShenScript in Node and in the browser as it behaves in CL or wherever else.

This would not prevent the use of the underlying async functionality, it would just not be done through Shen's primitives. If you wanted to for example write a Web server in Shen, you would not use those primitives anyway, or even worry about portability much...

...eh, we seem to be of two minds here. I remember the main original Shen website saying "write once, run everywhere" and every platform-specific thing makes that less true. This goes back to comments I've made before like when we say Shen, is supported on platforms X, Y and Z what's "Shen" in each of those cases? Is it just the kernel? Which version of the kernel? According to the color-coded chart on the open source site, "Shen" is a little different per platform so which "Shen" are we talking about? And to a developer, "Shen" might mean the kernel plus common libraries, which might be different per-platform... it's kind of a mess and I don't want to make it any worse.

Did those benchmarks include compilation time? Thats where I can see some overhead happening.

They did - the only AOT compilation in ShenScript is when you build it - only the kernel is pre-rendered. All other code gets read and compiled JIT, just like JavaScript.

What I don't understand is what kind of runtime overhead statements could have. Lets say the Javascript compiler is really good at optimizing IIFEs, what it will produce is equivalent to the statements version, right?

That's what I was thinking, but the results spoke for themselves. Maybe the code getting generated was excessively noisy in some way, but I'm not sure how. There were a lot of intermediate temp let variables but I don't know why that would matter so much.

...eh, we seem to be of two minds here. I remember the main original Shen website saying "write once, run everywhere" and every platform-specific thing makes that less true

Ideally, yes, but once you start building anything useful that goes beyond manipulation of data, you are going to have to depend on functionality provided by the platform.

This is already a problem in Scheme, which is far more defined than Shen is, has been around for much longer and is less varied when it comes to platforms on which it runs on.

When you want to write code to solve a more complicated problem that involves interaction with the outside world, it is far more likely for it to be non-portable than portable.

To make Scheme code portable, what you do is write implementation-specific code for all implementations you want to support, and use cond-expand so that different code is evaluated on different implementations.

But then again, the kind of stuff you see that can be made portably, is not stuff that talks to the operating system, it is stuff like XML parsers, pattern matching macros, etc, not HTTP servers or libraries to talk to databases or anything like that.

All the above applies to Common Lisp too, but it is just not as bad, because the Common Lisp spec includes faaaaaar more than Scheme's or Shen, so there is more common ground covered. But even to a lesser degree the problem exists there too.

There are only a few I/O primitives in Shen, if you want to write something more complicated, those will be enough, and in that case you will have to depend on Nodejs stuff (same if you were to write something that runs on the browser and manipulates the DOM).

That's what I was thinking, but the results spoke for themselves. Maybe the code getting generated was excessively noisy in some way, but I'm not sure how. There were a lot of intermediate temp let variables but I don't know why that would matter so much.

Ok, I'm going to do the following. I'm going to compile a bunch of Shen functions, then try to manually translate them to a statement oriented version, then run then and compare (using a profile if I can figure out how to do it in node).

Just a quick update on this. I tried the following functions:

(define do-times
  0 F -> done
  N F -> (do (F void)
             (do-times (- N 1) F)))

(define simplify
  [Op A B] -> (s Op (simplify A) (simplify B))
  A -> A)

(define s
  + M N -> (+ M N)      where (and (number? M) (number? N))
  + 0 F -> F
  + F 0 -> F
  + A [+ B C] -> (simplify [+ [+ A B] C])
  * M N -> (* M N)    where (and (number? M) (number? N))
  * 0 F -> 0
  * F 0 -> 0
  * F 1 -> F
  * 1 F -> F
  * A [* B C] -> (simplify [* [* A B] C])
  Op A B -> [Op A B])

(let Exp [* x [+ [+ [* 12 0] [+ 23 8]] y]]
  (time (do-times 200000 (/. _ (simplify Exp)))))

I ran this loop, then a modified version without all the async/await, and then one where the code was translated into the statements equivalent, which looks like this:

 ((s$c, simplify$c) =>
    $.d('simplify', $.l(V2 => {
        if ($.isCons(V2) && ($.isCons(V2.tail) && ($.isCons($.asCons(V2.tail).tail) && null === $.asCons($.asCons(V2.tail).tail).tail))) {
            return $.b(s$c.f, $.asCons(V2).head, $.t(simplify$c.f($.asCons($.asCons(V2).tail).head)), $.t(simplify$c.f($.asCons($.asCons($.asCons(V2).tail).tail).head)));
        } else {
            return V2;
        }
    })))($.c('s'), $.c('simplify'));

 (($2b$s, simplify$c, $2a$s) =>
    $.d('s', $.l((V6, V7, V8) => {
        if ($2b$s === V6 && ($.isNumber(V7) && $.isNumber(V8))) {
            return $.asNumber(V7) + $.asNumber(V8);
        } else if ($2b$s === V6 && 0 === V7) {
            return V8;
        } else if ($2b$s === V6 && 0 === V8) {
            return V7;
        } else if ($2b$s === V6 && ($.isCons(V8) && ($2b$s === V8.head && ($.isCons(V8.tail) && ($.isCons($.asCons(V8.tail).tail) && null === $.asCons($.asCons(V8.tail).tail).tail))))) {
            return $.b(simplify$c.f, $.r([$2b$s, $.r([$2b$s, V7, $.asCons($.asCons(V8).tail).head])], $.asCons($.asCons(V8).tail).tail));
        } else if ($2a$s === V6 && ($.isNumber(V7) && $.isNumber(V8))) {
            return $.asNumber(V7) * $.asNumber(V8);
        } else if ($2a$s === V6 && 0 === V7) {
            return 0;
        } else if ($2a$s === V6 && 0 === V8) {
            return 0;
        } else if ($2a$s === V6 && 1 === V8) {
            return V7;
        } else if ($2a$s === V6 && 1 === V7) {
            return V8;
        } else if ($2a$s === V6 && ($.isCons(V8) && ($2a$s === V8.head && ($.isCons(V8.tail) && ($.isCons($.asCons(V8.tail).tail) && null === $.asCons($.asCons(V8.tail).tail).tail))))) {
            return $.b(simplify$c.f, $.r([$2a$s, $.r([$2a$s, V7, $.asCons($.asCons(V8).tail).head])], $.asCons($.asCons(V8).tail).tail));
        } else {
            return $.r([V6, V7, V8]);
        }
    })))($.s `+`, $.c('simplify'), $.s `*`);

Times:

Original: run time: 1.378000020980835 secs
Without async/await: run time: 0.4609999656677246 secs
With statements: run time: 0.46000003814697266 secs

It seems async/await adds considerable overhead, not sure about functions yet as this code I tried happens to not introduce intermediary functions, so the runtime for both non-async versions is the same.

Next time I will try something with a bunch of let expressions to see what happens.

On another topic, based on the code I see on backend.js I thought the asX wrappers would not be generated for this case:

* M N -> (* M N)    where (and (number? M) (number? N))

but I got this:

$2b$s === V6 && ($.isNumber(V7) && $.isNumber(V8)) ?
         $.asNumber(V7) + $.asNumber(V8)

Just wanted to point that out.

Using const vs IIFEs:

(define let-test
  N -> (let N+1 (+ N 1)
            N+2 (+ N+1 1)
            N+3 (+ N+2 1)
            N+4 (+ N+3 1)
            N+5 (+ N+4 1)
            N+6 (+ N+5 1)
            N+7 (+ N+6 1)
            N+8 (+ N+7 1)
            N+9 (+ N+8 1)
          N+9))

(time (do-times 2000000 (/. _ (let-test 100))))

Statements version:

$.d('let-test', $.l(V22 => {
    const N$2b1 = $.asNumber(V22) + 1; {
        const N$2b2 = $.asNumber(N$2b1) + 1; {
            const N$2b3 = $.asNumber(N$2b2) + 1; {
                const N$2b4 = $.asNumber(N$2b3) + 1; {
                    const N$2b5 = $.asNumber(N$2b4) + 1; {
                        const N$2b6 = $.asNumber(N$2b5) + 1; {
                            const N$2b7 = $.asNumber(N$2b6) + 1; {
                                const N$2b8 = $.asNumber(N$2b7) + 1; {
                                    const N$2b9 = $.asNumber(N$2b8) + 1;
                                    return N$2b9;
                                }}}}}}}}}))

Times:

Original: run time: 1.9229998588562012 secs
Statements: run time: 1.7259998321533203 secs

So, the intermediary functions add some overhead, but it is much less than async/await.

What commit/tag do you recommend for me to check the "fabrs" produced code? I want to try more complicated code examples, but translating them by hand gets tedious.

Just out of curiosity, I tried the version with the asNumber wrappers removed, and as you said, there is almost no runtime overhead (or not at all, because of measuring error).

Btw on this example I also got those wrappers, but it was not what I was expecting.

But can't those Shen functions just use the blocking I/O versions of the underlying platform?

Just wanted to update on this, turns out I was very wrong here, it seems Node doesn't provide what I thought it provided, and a C extension would be required, which is not ideal. I will keep thinking about this and let you know if I figure out anything.

rkoeninger / ShenScript