udem-dlteam / pnut

🥜 A Self-Compiling C Transpiler Targeting Human-Readable POSIX Shell
https://pnut.sh
BSD 2-Clause "Simplified" License
425 stars 14 forks source link

Support structures in exe and shell backends #21

Closed laurenthuberdeau closed 5 months ago

laurenthuberdeau commented 5 months ago

Context

Structures are used throughout TCC and are essential to compile any sufficiently complex programs. This PR extends the parser to parse struct definition, and implements the sizeof, . (dot, exe only) and -> operators in the shell and exe backends.

Shell backend

The shell backend supports structures by mapping each member to a readonly variable containing the member offset. Struct member access is then implemented by adding the structure address to the member variable. This choice restricts the use of structures with overlapping names. For the offset, all fields occupy 1 word, as arrays (which would occupy 1 word per element) are not supported.

Because variables in the shell backend are mapped to shell variables and not memory locations, anything that returns an object that's not word-sized cannot be supported as shell variables can only contain 1 number (unless we break that rule, but I'm not sure the resulting code would be particularly readable). This excludes variables with struct types that aren't pointers, nested structures (again, that aren't pointers), passing and returning structures as values to and from functions. With some more type information, this we could lift this limitation for some cases, maybe something for later.

This also means that the '.' operator isn't too useful, as any handle on a struct is through a reference. . was thus left unimplemented.

// Struct member access is implemented like array indexing. Each member is mapped to a readonly variable containing the offset of the member and accessing to s->a is equivalent to *(s + a).
//  For example, for the struct:
    struct Point {
      int x;
      int y;
    }
    Point *p = malloc(sizeof(Point));
    p->y = 42;

//  The following code is generated:

    readonly __x=0
    readonly __y=1
    readonly __sizeof__Point=2

    _malloc p $((__sizeof__Point))
    : $(( _$((p + __x)) = 42 ))

Exe backend

Almost everything should be working. This includes:

Note that returning structures from function is not supported.

Use of structures in Pnut

With struct support in the shell backend, we could start using structs in Pnut. This could improve performance and generate shorter code, as well as improve safety by not having everything be an int. Until now, without structures, we've been emulating them using a statically allocated heap and objects are simply indices in that array, which comes with some downsides.

Each field access is done using a function. Each function call must call save_vars/unsave_vars, which comes at significant cost (see OPTIMIZE_CONSTANT_PARAM optimization which bring ~40% execution time reduction), and each function call must be done on its own line (using temporary variables) instead of inline which can inflate a simple function call to multiple lines of Shell.

And then by nature of having everything be an int, it is very easy to pass the wrong object to a function without any feedback from the compiler. A common example is passing a identifier probe object instead of an AST identifier to a function that expects the latter.

TODO

A few things are still left to do:

monnier commented 5 months ago

Structures are used throughout TCC and are essential to compile any sufficiently complex programs. This PR extends the parser to parse struct definition, and implements the sizeof, . (dot, exe only) and -> operators in the shell and exe backends.

Finally!

The shell backend supports structures by mapping each member to a readonly variable containing the member offset. Struct member access is then implemented by adding the structure address to the member variable. This choice restricts the use of structures with overlapping names. For the offset, all fields occupy 1 word, with the exception of arrays that occupy 1 word per elem.

OK.

Have you/we looked at other compilation strategies?

I see that you "Support passing struct as value to functions". Do you remember where you've needed that? [ Is this written down somewhere and if so where? ]

Because variables in the shell backend are mapped to shell variables and not memory locations, anything that returns an object that's not word-sized cannot be supported as shell variables can only contain 1 number (unless we break that rule, but I'm not sure the resulting code would be particularly readable). This excludes variables with struct types that aren't pointers, nested structures (again, that aren't pointers), passing and returning structures as values to and from functions.

This also means that the '.' operator isn't too useful, as any handle on a struct is through a reference. . was thus left unimplemented.

Sounds good.

Use of structures in Pnut

With struct support in the shell backend, we could start using structs in Pnut. This could improve performance and generate shorter code, as well as improve safety by not having everything be an int. Until now, without structures, we've been emulating them using a statically allocated heap and objects are simply indices in that array, which comes with some downsides.

I think this will be an interesting part of the paper, indeed. We can compare the generated code size and execution time, trying to quantify the loss due to supporting the struct feature and the gain due to being able to make use of structs.