Clang chokes on generated code for Data.Either.Nested

felixSchl commented 5 years ago

Clang appears to choke on compiling the generated code for Data.Either.Nested. It uses all of my system's memory until I have to manually interrupt it to give up. The generated c code is about 3692 loc but that's pre macro expansion. I am not sure how to diagnose this further at this point.

felixSchl commented 5 years ago

This might be the end of the road of using the Blocks extension to Clang. I see a number of ways forward from here, each with their own tradeoffs:

Use C11 lambdas

We could give up on targeting C and use C++ for it's lambdas only (the rest of the code remains unchanged and simple). However, what made this project appealing to me was to combine the simplicity of C with the simplicity of PureScript. By simplicity I mean that on the one hand I can directly work with the system's resources in C without any further abstractions, while on the other hand the source language affords the simplicity of pure FP code.

Diagnose and fix the issue with blocks

Hang on to the Blocks extension for now, diagnosing what's going wrong and attempting to fix it. From looking into it a bit, it seems the problem only occurs when using Block_copy which copies the block onto the heap.

Handle scopes manually

Derive from the CoreFn the captured context of each function and emit code based on that. This is the approach most aligned with the project's goals, albeit being more work. In particular, PURS_FFI_FUNC_ and friends, including PURS_LAMBDA will become history. I think this might be a fair trade off. We are targeting C after all - using the blocks extension was a nice point to get started, but it already removed from the simplicity factor of plain C. This approach should also play more nicely with garbage collection since we construct the context pointers ourselves (no implicit capturing from foreign-allocated memory like the blocks runtime).

Here's how it could look:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

typedef struct typ_s typ;

typedef void * v;
typedef struct { void * ctx; typ*(*fn)(void*,typ*); } f;

typedef enum { TYP_VAL = 0, TYP_FN } tag;
typedef union { v v; f f; } val;
typedef struct typ_s { tag tag; val val; } typ;

// Example: write 'const'
// const :: ∀ a. a -> b -> a

struct const$_ctx_1 { typ * a; };
struct const$_ctx_2 { typ * a; typ * b; };

typ * const$_2 (void * parent_ctx, typ * b) {
    struct const$_ctx_2 * ctx = malloc(sizeof(struct const$_ctx_2));
    memcpy(ctx, parent_ctx, sizeof(struct const$_ctx_1));
    ctx->b = b;
    return ctx->a;
}

typ * const$_1 (void * __unused__ /* global */, typ * a) {
    struct const$_ctx_1 * ctx = malloc(sizeof(struct const$_ctx_1));
    ctx->a = a;
    typ * typ = malloc(sizeof(typ));
    typ->tag = TYP_FN;
    typ->val.f.fn = const$_2;
    typ->val.f.ctx = (void *) ctx;
    return typ;
}

int main () {
    typ a = { .tag = TYP_VAL, .val = { .v = "foobar" } };
    typ * x = const$_1(NULL, &a);
    typ b = { .tag = TYP_VAL, .val = { .v = "raboof" } };
    typ * r = x->val.f.fn(x->val.f.ctx, &b);
    printf("et voila: %s\n", (char*) r->val.v);
    return 0;
}

The hardest part would be traversing the AST and fishing out the context structs, forward declaring them appropriately, making sure they are unique, and so on.

felixSchl commented 5 years ago

I chose to explore handling scopes manually as it best aligned with the project's goals. Since my primary concern was a poor FFI experience, I decided to try and establish the PURS_FFI_FUNC_X family of macros under the new design. The initial results are promising, I think:

https://github.com/pure-c/pure-c/blob/c9cde0cfc334ad2bcd5a42a6564d80577034d32b/runtime/purescript.h#L92-L149

The downstream usage of this particular family of macros remains unchanged, despite it's large underlying change. As far as PURS_LAMBDA is concerned, it is still possible, but more explicit. Instead of relying on the Blocks extension machinery, we must manually create a GC allocated scope structure carries along any of the state we'd like to capture. The great thing is that we can type these closures so that accessing the values in scope become a mere pointer offset (same holds for PURS_FFI_FUNC_X of course).

Example of replacing PURS_LAMBDA in FFI usage:

#include <purescript.h>

PURS_SCOPE_T(foo_scope_t, {
    int foo;
});

PURS_FFI_FUNC_3(foo, a, b, c, {
    foo_scope_t * scope = purs_new(foo_scope_t);
    scope->foo = 100;
    return purs_any_int_new(100);
});

int main () {
    const ANY * x = APP(foo, NULL);
    const ANY * y = APP(x, NULL);
    const ANY * z = APP(y, NULL);
    printf("hi: %i\n", z->value.i);
}

Now it's onto having the compiler emit these scope structures automatically and filling them in. It's conceptually not a difficult task, I think, but I am having trouble visualising the required AST traversals in my head. I am thinking of keeping the AST.Lambda type https://github.com/pure-c/pure-c/blob/4874a992a2e1b8a8d6b5359a828377bb50e6b5a8/src/Language/PureScript/CodeGen/C/AST.purs#L193-L197 and erase it in a second required pass. The pretty printer would throw an InvalidStateError if it received an AST.Lambda. So, while AST.Lambda might not be a valid AST node per se, it is useful to keep it in the AST adt in order to split the logic across two traversals, thereby simplifying implementation. Given the overall partiality and incompleteness of AST related types (here and in PureScript proper) it seems like a small and fair trade off to make.

So the traversal I am currently struggling with implementing would essentially analyze each AST.Lambda to see what variables it references that were not introduced in it's own scope (therefore captured). Given those variables, we can create a structure type suitable for the use by that lambda/function, then fill it with what's currently in scope and create a continuation via purs_new_cont.

felixSchl commented 5 years ago

Alright, blocks have been dropped and manual closure management is working a treat:

We can now compile using clang or gcc
Code is easier to debug as generated closures have a location in the emitted sources
Memory management has massively simplified
Example2 appears to run about 4 times faster than the optimized node.js equivalent (anecdotal)

A few more things to iron out then merging into master! There's still plenty of optimizations that could be run as a separate pass on the generated ast -- but that's for later.

felixSchl commented 5 years ago

Merged! This issue is no longer relevant :)

pure-c / purec