Closed laurenthuberdeau closed 5 months ago
Structures are used throughout TCC and are essential to compile any sufficiently complex programs. This PR extends the parser to parse struct definition, and implements the
sizeof
,.
(dot, exe only) and->
operators in the shell and exe backends.
Finally!
The shell backend supports structures by mapping each member to a readonly variable containing the member offset. Struct member access is then implemented by adding the structure address to the member variable. This choice restricts the use of structures with overlapping names. For the offset, all fields occupy 1 word, with the exception of arrays that occupy 1 word per elem.
OK.
Have you/we looked at other compilation strategies?
I see that you "Support passing struct as value to functions". Do you remember where you've needed that? [ Is this written down somewhere and if so where? ]
Because variables in the shell backend are mapped to shell variables and not memory locations, anything that returns an object that's not word-sized cannot be supported as shell variables can only contain 1 number (unless we break that rule, but I'm not sure the resulting code would be particularly readable). This excludes variables with struct types that aren't pointers, nested structures (again, that aren't pointers), passing and returning structures as values to and from functions.
This also means that the '.' operator isn't too useful, as any handle on a struct is through a reference.
.
was thus left unimplemented.
Sounds good.
Use of structures in Pnut
With struct support in the shell backend, we could start using structs in Pnut. This could improve performance and generate shorter code, as well as improve safety by not having everything be an int. Until now, without structures, we've been emulating them using a statically allocated heap and objects are simply indices in that array, which comes with some downsides.
I think this will be an interesting part of the paper, indeed. We can compare the generated code size and execution time, trying to quantify the loss due to supporting the struct feature and the gain due to being able to make use of structs.
Context
Structures are used throughout TCC and are essential to compile any sufficiently complex programs. This PR extends the parser to parse struct definition, and implements the
sizeof
,.
(dot, exe only) and->
operators in the shell and exe backends.Shell backend
The shell backend supports structures by mapping each member to a readonly variable containing the member offset. Struct member access is then implemented by adding the structure address to the member variable. This choice restricts the use of structures with overlapping names. For the offset, all fields occupy 1 word, as arrays (which would occupy 1 word per element) are not supported.
Because variables in the shell backend are mapped to shell variables and not memory locations, anything that returns an object that's not word-sized cannot be supported as shell variables can only contain 1 number (unless we break that rule, but I'm not sure the resulting code would be particularly readable). This excludes variables with struct types that aren't pointers, nested structures (again, that aren't pointers), passing and returning structures as values to and from functions. With some more type information, this we could lift this limitation for some cases, maybe something for later.
This also means that the '.' operator isn't too useful, as any handle on a struct is through a reference.
.
was thus left unimplemented.Exe backend
Almost everything should be working. This includes:
Note that returning structures from function is not supported.
Use of structures in Pnut
With struct support in the shell backend, we could start using structs in Pnut. This could improve performance and generate shorter code, as well as improve safety by not having everything be an int. Until now, without structures, we've been emulating them using a statically allocated heap and objects are simply indices in that array, which comes with some downsides.
Each field access is done using a function. Each function call must call
save_vars/unsave_vars
, which comes at significant cost (seeOPTIMIZE_CONSTANT_PARAM
optimization which bring ~40% execution time reduction), and each function call must be done on its own line (using temporary variables) instead of inline which can inflate a simple function call to multiple lines of Shell.And then by nature of having everything be an int, it is very easy to pass the wrong object to a function without any feedback from the compiler. A common example is passing a identifier probe object instead of an AST identifier to a function that expects the latter.
TODO
A few things are still left to do:
&
operator