jserv / amacc

Small C Compiler generating ELF executable Arm architecture, supporting JIT execution
Other
1.01k stars 159 forks source link

Squint -- a peephole optimizer for amacc #87

Closed HPCguy closed 2 years ago

HPCguy commented 2 years ago

This pull request is a starting point for discussion on how to integrate Squint most effectively with amacc. It may best be done with separate repos that are tied through modules, but I don't know.

I added a -p option to amacc.c that must currently be used with the amacc -o option to produce an ELF file that can be optimized with scripts/peep. In order for peep to work, a version of squint must be compiled, e.g. gcc -o squint squint.c .

Example:

% gcc -o amacc-gcc amacc.c -ldl % ./amacc-gcc -p -o amacc amacc.c % scripts/disasm amacc > amacc.s % gcc -o squint squint.c % scripts/peep amacc executable file amacc was optimized. % scripts/disasm amacc > amacc.s.opt % sdiff -W -d -w 180 amacc.s amacc.s.opt

I have more advanced versions of this that has even beat gcc -O1 for fib.c and possibly other things, but I suspect you will not like my coding style, so we will have to discuss.

HPCguy commented 2 years ago

Hi, I do not have a cross compile or QEMU environment, so I likely can't test or modify the make scripts without breaking something. Can you add that?

HPCguy commented 2 years ago

I will review all comments tomorrow before making changes. Have a good day. It is 3:30AM here so I need some sleep.

HPCguy commented 2 years ago

One more thing -- now that any library call is supported, it is possible to let the user use the -p option in place of the -o option, and then have a system call at the end of main() in amacc.c : if (peep) system("squint "); Also, squint will work on jit() directly in amacc if you put squint.c in amacc.c and remove the squint() manin() function. I doubt you will want to do that, but it works.

HPCguy commented 2 years ago

This version of squint versus baseline:

feature amacc.s amacc.s.opt
ldr 8091 4717
str 6732 3043
reg-reg 9654 5673
sub 1878 693
add 1567 1170
mul 60 16
jserv commented 2 years ago

Hi, I do not have a cross compile or QEMU environment, so I likely can't test or modify the make scripts without breaking something. Can you add that?

No problem at all. You can modify Makefile to reflect squint specific changes. Then, once make check includes the necessary checks, the CI can validate itself.

HPCguy commented 2 years ago

I created a new benchmarking test (Sieve of Eratosthenes), and I am considering pulling minimal register renaming from my non-published version of squint, hence the delay. I'll have finished that decision process by tonight, and will commit by tomorrow. It would be nice to be at least as fast as gcc -O0 with a minimal set of peephole changes.

jserv commented 2 years ago

@lecopzer, Please review this effort on introducing a peephole optimizer to AMaCC.

lecopzer commented 2 years ago

After a quick review, I'd like to see a doc like https://github.com/jserv/amacc/blob/master/docs/IR.md because it seems a complex algorithm.

Does this peephole algorithm made by yourself or any referenced paper or exist algorithm?

It looks like Squint inserts some hint (nop) into codegen and resolve in squint.c. But we don't have a brief concept of how the algorithm does and make it hard to review.

for example, in squint.c, how the map was constructed and the usage, what the meaning of ambiguous function naming such as simplifyBranchX apply_peepholesX X=1~4, what the NOP use for...etc.

HPCguy commented 2 years ago

Everything in here was created by me except for popcount, which is a widely published algorithm, so not really attributable to anyone. popcount is "open literature".

AMaCC is not documented, not even subsets such as the ELF driver, the stack based AST, the type system, or anything other than the IR. I can write documentation when I get around to it, I just don't think it is fair to require it for this commit.

There are comments at the top of every function desribing what that function covers.

HPCguy commented 2 years ago

Final comment -- AMaCC itself is extremely compilcated. The fact that amacc itself can be optimized by Squintis a testament to its robustness. Also, you can compare the assembly language of any program produced with the original AMaCC (cf scripts/disasm) compared with the peephole optimized version of that executable as a testament to the quality of the optimizations.

HPCguy commented 2 years ago

Eating some crow (idiom) now, I found a problem in the frame-register optimizer I just added since yesterday. If I can't quickly resolve it, I will remove the frame-register optimizer.

HPCguy commented 2 years ago

OK. I simplified the frame-register optimization and re-commited.

Best time of sieve benchmark for 5 runs: amacc (no squint): 0m7.643s amacc (squint): 0m2.312s Gcc -O0: 0m2.191s

I will commit the Makefile tomorrow.

HPCguy commented 2 years ago

@lecopzer, looking forward to ELF documentation.

jserv commented 2 years ago

Best time of sieve benchmark for 5 runs:

compilers execution time
amacc (no squint) 0m7.643s
amacc (squint) 0m2.312s
gcc -O0 0m2.191s

I am curious about the conditions when AMaCC + Squint can be approaching gcc -O1.

HPCguy commented 2 years ago

"or example, in squint.c, how the map was constructed and the usage, what the meaning of ambiguous function naming such as simplifyBranchX apply_peepholesX X=1~4, what the NOP use for...etc."

As I explain in the documentation, the order that functions are applied or how many times they are applied improves optimization. The generic names mean that anything could be placed in the function blocks in the future, and each function just provides a scope that can be put it a different order or repeated multiple times.

HPCguy commented 2 years ago

I am curious about the conditions when AMaCC + Squint can be approaching gcc -O1.

So far, just fib is faster than gcc -01. I have a function called simplify_frame() that gets rid of fp, so that only lr gets pushed with a function call, and the stack related logic gets modified/eliminated. Fib is all data movement on the stack frame via function calls, so it ends up being faster than gcc -O1 with those modifications.

That said, there is also a ton of use-def analysis that can be done to cut at least 1/3 of the instructions in many loops out, and to better balance pipelines. For instance, there are two load pipelines and two integer pipelines. Keeping them all going at full tilt can be done by balancing LEA location (explicit integer or CISC addressing) and balancing immediate constants vs grabbing them from pc-relative addresses.

Getting performance is not hard. What is hard is balancing it against code size when the philosophy of AMaCC is "small but powerful."

HPCguy commented 2 years ago

OK. I tried to create testing for the peephole capability. I can't really do anything other than modify text (I can't run amacc make in my environment), so I've probably missed some detail. I could really, really use some help on this.

jserv commented 2 years ago

OK. I tried to create testing for the peephole capability. I can't really do anything other than modify text (I can't run amacc make in my environment), so I've probably missed some detail. I could really, really use some help on this.

CI reported:

  SelfCC    elf/amacc
  SelfCC    elf/amacc-opt
make: *** [Makefile:59: elf/amacc-opt] Error 1
HPCguy commented 2 years ago

On my development system, I locally changed the QEMU and CROSS_COMPILE macros to work with my environment. When I do so, using the Makefile I have currently committed, everything passes as shown below. I don't understand why it is not working when I push to your repository. I have done everything I can on my end. I cannot debug the QEMU.

$:~/amacc $ make check CC+LD squint SelfCC elf/amacc SelfCC elf/amacc-opt Please remember the amacc '-p' compiler flag is a prerequisite. executable file elf/amacc-opt was optimized. [* verify tests/fib.c *****] [* verify tests/fib.c *****] [* verify tests/fib.c *] tests/fib.c Passed [ verify tests/hello.c ****] [ verify tests/hello.c ****] [ verify tests/hello.c ] tests/hello.c Passed [* verify tests/func_param.c ***] [* verify tests/func_param.c *****] [* verify tests/func_param.c *] tests/func_param.c Passed [ verify tests/assign.c ****] [ verify tests/assign.c ****] [ verify tests/assign.c ] tests/assign.c Passed [* verify tests/maze.c ***] [* verify tests/maze.c *****] [* verify tests/maze.c *] tests/maze.c Passed [ verify tests/struct.c ****] [ verify tests/struct.c ****] [ verify tests/struct.c ] tests/struct.c Passed [* verify tests/comments.c ***] [* verify tests/comments.c *****] [* verify tests/comments.c *] tests/comments.c Passed [ verify tests/jit.c ****] [ verify tests/jit.c ****] [ verify tests/jit.c ] tests/jit.c Passed [* verify tests/literal.c ***] [* verify tests/literal.c *****] [* verify tests/literal.c *] tests/literal.c Passed [ verify tests/read.c ****] [ verify tests/read.c ****] [ verify tests/read.c ] tests/read.c Passed [* verify tests/while.c ***] [* verify tests/while.c *****] [* verify tests/while.c *] tests/while.c Passed [ verify tests/arginc.c ****] [ verify tests/arginc.c ****] [ verify tests/arginc.c ] tests/arginc.c Passed [* verify tests/goto.c ***] [* verify tests/goto.c *****] [* verify tests/goto.c *] tests/goto.c Passed [ verify tests/inc.c ****] [ verify tests/inc.c ****] [ verify tests/inc.c ] tests/inc.c Passed [* verify tests/func_call.c ***] [* verify tests/func_call.c *****] [* verify tests/func_call.c *] tests/func_call.c Passed [ verify tests/cond.c ****] [ verify tests/cond.c ****] [ verify tests/cond.c ] tests/cond.c Passed [* verify tests/shift.c ***] [* verify tests/shift.c *****] [* verify tests/shift.c *] tests/shift.c Passed [ verify tests/char.c ****] [ verify tests/char.c ****] [ verify tests/char.c ] tests/char.c Passed [* verify tests/duff.c ***] [* verify tests/duff.c *****] [* verify tests/duff.c *] tests/duff.c Passed [ verify tests/ptr.c ****] [ verify tests/ptr.c ****] [ verify tests/ptr.c ] tests/ptr.c Passed [* verify tests/eq.c ***] [* verify tests/eq.c *****] [* verify tests/eq.c *] tests/eq.c Passed [ verify tests/enum.c ****] [ verify tests/enum.c ****] [ verify tests/enum.c ] tests/enum.c Passed [* verify tests/local.c ***] [* verify tests/local.c *****] [* verify tests/local.c *] tests/local.c Passed [ verify tests/for.c ****] [ verify tests/for.c ****] [ verify tests/for.c ] tests/for.c Passed [* verify tests/sieve.c ***] [* verify tests/sieve.c *****] [* verify tests/sieve.c *] tests/sieve.c Passed [ verify tests/printf.c ****] [ verify tests/printf.c ****] [ verify tests/printf.c ] tests/printf.c Passed [* verify tests/switch.c ***] [* verify tests/switch.c *****] [* verify tests/switch.c ] tests/switch.c Passed [ C to IR translation ] Passed [ JIT compilation + execution ] Passed [ ELF generation ] Passed [ nested/self compilation ] Passed

jserv commented 2 years ago

On my development system, I locally changed the QEMU and CROSS_COMPILE macros to work with my environment. When I do so, using the Makefile I have currently committed, everything passes as shown below. I don't understand why it is not working when I push to your repository. I have done everything I can on my end. I cannot debug the QEMU.

One-line change makes it work!

--- a/Makefile
+++ b/Makefile
@@ -56,7 +56,7 @@ $(OBJ_DIR)/$(BIN): $(BIN)
 $(OBJ_DIR)/$(BIN)-opt: $(BIN) $(PEEP)
        $(VECHO) "  SelfCC\t$@\n"
        $(Q)$(ARM_EXEC) ./$< -p -o $@ $(BIN).c
-       $(Q)$(ARM_EXEC) scripts/peep $@
+       $(Q)scripts/peep $@

 SHELL_HACK := $(shell mkdir -p $(OBJ_DIR))
 $(TEST_DIR)/%.o: $(TEST_DIR)/%.c $(BIN) $(OBJ_DIR)/$(BIN) $(OBJ_DIR)/$(BIN)-opt

And

--- a/scripts/peep
+++ b/scripts/peep
@@ -8,5 +8,5 @@ fi
 PPATH="`dirname \"$0\"`"
 if [[ ! -x $PPATH/../squint ]] ;
    then echo "A 'squint' executable must exist.";
-   else $PPATH/../squint $1 `readelf -a $1 2>/dev/null | awk '/\.text.*PROGBITS/ { print $6, $7 }'`;
+   else $ARM_EXEC $PPATH/../squint $1 `readelf -a $1 2>/dev/null | awk '/\.text.*PROGBITS/ { print $6, $7 }'`;
 fi
HPCguy commented 2 years ago

Thank you so much, guys! I will make these changes, improve the testsing a tiny bit more, and check this in. I will also examine the documentation and squint.c one more time for spacing and style consistency after making your suggested changes. I will commit some time tomorrow morning, my time here in the US. Thank you!

HPCguy commented 2 years ago

PS I found a subtle logic error in simplifyBranch4 that I will fix. It usually works as written, but can fail for certain amacc generated sequences. It was working as written only because I got lucky.

lecopzer commented 2 years ago

Everything in here was created by me except for popcount, which is a widely published algorithm, so not really attributable to anyone. popcount is "open literature".

AMaCC is not documented, not even subsets such as the ELF driver, the stack based AST, the type system, or anything other than the IR. I can write documentation when I get around to it, I just don't think it is fair to require it for this commit.

There are comments at the top of every function desribing what that function covers.

I think that is the problem AMaCC has to be improved. We have some materials for analyzing AMaCC in Chinese, but they never push back to docs/XXX.md. @jserv do we plan to add such docs for each subsystem?

Also, actually I only want a brief concept and key variables usage or steps for the Squint, which could be added in commit message instead of the result only. It's my bad that I didn't describe clearly. It'd be better to add a .md but I think it's not necessary right now because as you said, AST/ELF don't have such file and it's not fair.

and thanks for your docs, I'll start to review it.

HPCguy commented 2 years ago

@jserv I tried the makefile change, but something seems to have gone wrong. I have to go to a party with my wife right now, but will check here when I get back.

Thanks again guys, for all your help.

jserv commented 2 years ago

We have some materials for analyzing AMaCC in Chinese, but they never push back to docs/XXX.md. @jserv do we plan to add such docs for each subsystem?

Yes, I did plan to improve the documentation while preparing the lecture materials. Feel free to submit the pull requests for demystifying AMaCC.

jserv commented 2 years ago

@jserv I tried the makefile change, but something seems to have gone wrong. I have to go to a party with my wife right now, but will check here when I get back.

At present, make check is successful. Can you address the issues you found?

HPCguy commented 2 years ago

As far as I know, I have resolved all issues I found. The simplify_branch4 fix I had mentioned was actually an issue with create_pushpop_map, which I have fixed on squint.c line 1388 of the current commit. There is a very specific 4 line sequence of instructions that AMaCC generates (on lines 1564, 1657-1665) that this fix targets, that I believe can't occur otherwise in AMaCC generated code. The logic in simplify_branch4 is tricky, and I had fooled myself into thinking there was a problem there, when there was not.

I would also like to improve the testing of amacc-opt, but I would like to do that in an isolated commit unrelated to this pull request, where I have more time to get the tests right without breaking anything, if that is ok. The current test is pretty good, just not perfect.

HPCguy commented 2 years ago

And most importantly, thank you for fixing the make check error.

HPCguy commented 2 years ago

Final note for this pull request -- Please do not feel obligated to take squint if it is not sensible to include with AMaCC. As I mentioned in the first post in this conversation, I do not want you to put squint here unless you feel it would make a sensible contribution in the context of this repo. I am happy enough that AMaCC is available on the web to work with.

HPCguy commented 2 years ago

@loganchien Thank you for sharing your review timeframe. When I see a message from you that your review is complete, I will aggregate all requested changes in one commit.

jserv commented 2 years ago

I am thinking of the feasibility to change option -p to -O which indicates the generation of peephole optimization aware code. It would be a bit similar to Link-Time Optimization. Later, when we add more optimizers, the option -O can be acting the role to classify and pass through.

HPCguy commented 2 years ago

@jserv To clarify, do you want 'amacc -O -o foo foo.c'? I am worried that people will forget the squint step, and believe that the '-O' is applying the optimization directly on the executable, since that is standard compiler convention. I have found that people do not pay close attention when trying to use tools. Using -Op from the start may add clarity.

jserv commented 2 years ago

@jserv To clarify, do you want 'amacc -O -o foo foo.c'? I am worried that people will forget the squint step, and believe that the '-O' is applying the optimization directly on the executable, since that is standard compiler convention. I have found that people do not pay close attention when trying to use tools. Using -Op from the start may add clarity.

I agree with the -Op option. It would be more consistent with peephole optimizer.

HPCguy commented 2 years ago

FYI I just modified squint for fun to run as a shared object library. Here is a jit() run:

$ gcc -o amacc amacc.c -Wl,-rpath=/home/pi/amacc -ldl libsquint.so

$ time ./amacc tests/sieve.c 
real    0m7.786s
user    0m7.774s
sys 0m0.011s

$ time ./amacc -Op tests/sieve.c 
real    0m2.384s
user    0m2.354s
sys 0m0.022s
diff --git a/amacc.c b/amacc.c
index ee21887..5e9185c 100644
--- a/amacc.c
+++ b/amacc.c
@@ -22,6 +22,7 @@
 #include <sys/stat.h>
 #include <fcntl.h>
 #include <dlfcn.h>
+#include "squint.h"

 /* 64-bit host support */
 #if defined(__x86_64__) || defined(__aarch64__)
@@ -1831,6 +1832,8 @@ int jit(int poolsz, int *main, int argc, char **argv)
     if (je >= jitmap) die("jitmem too small");
     *tje = reloc_bl(jitmap[((int) main - (int) text) >> 2] - (int) tje);

+    if (peephole) peephole_opt((int *)jitmem, je-1);
+
     // hack to jump into specific function pointer
     __clear_cache(jitmem, je);
     int *res = bsearch(&sym, sym, 1, 1, (void *) _start);

I can configure the makefile to build as standalone or shared library if you are interested. If -Op is enabled I can add libsquint.so to the ELF .so list. (just changed everything to -Op for what it will look like on next commit).

jserv commented 2 years ago

FYI I just modified squint for fun to run as a shared object library. Here is a jit() run: [...] I can configure the makefile to build as standalone or shared library if you are interested. If -Op is enabled I can add libsquint.so to the ELF .so list. (just changed everything to -Op for what it will look like on next commit).

I like the idea to run Squint as a shared library, which means that AMaCC can compile itself while Squint can be built with optimizing compilers such as gcc and clang. Can you use dlopen and dlsym to load libsquint.so?

After you resolve the issues found by @loganchien, I am about to merge this effort.

HPCguy commented 2 years ago

I have resolved all outstanding issues I am aware of.

HPCguy commented 2 years ago

Let me be blunt. It has been 14 days since this pull request has been essentially ready to go, and 20 days total in a pull request state. At several points, I asked if Squint was too much for AMaCC to accept, and you replied that you thought it was appropriate. The one bug that @loganchien found is one which will never appear in actual code, whereas AMaCC itself is riddled with bugs when trying to write the simplest of code fragments. I have fixed many such bugs since I began contributing to this project six months ago that are now a part of AMaCC. I find it disrespectful that I have had this pull request open for 20 days. You need to accept this pull request by tomorrow or I close it, and continue with AMaCC in my own repository on Github. While I am upset in this comment, I think all my points are both true and reasonable. I think AMaCC is a great project, and I would like to continue contributing here rather than on my own, but I have to trade that against being held back from being able to do good work in a timely fashion.

loganchien commented 2 years ago

I understand your frustration for the long review time and I feel sorry about it. But I must clarify that I do not intend to be disrespectful. I am happy to see the change and willing to see it to be merged eventually.

The prolonged review time is the direct result of two reasons: (1) the magnitude of squint.cc and (2) the limited spare time I have. To review the code properly, I have to think thoroughly on (1) what invariants are you trying to maintain in the program, (2) are all boundary cases properly handled, (3) what can go wrong, and (4) sometimes I have to check the ARM reference manual to ensure the assumption is correct (even if they are just two line of code like Line 688-689). All of these are time consuming work and unfortunately I only have roughly 8 hours per week for this work (and might be even shorter if some unexpected over-time work or house keeping work is taking away my time).

I think I won't be able to review it at a faster speed. I leave it to jserv@ to decide whether he wants to take this PR as-is and maybe fix any bugs found later. I am only trying to be helpful here. Thank you for your understanding.

HPCguy commented 2 years ago

I would feel it as a responsibility to fix any bugs as they arise. Since this is an auxillary program and not a part of AMaCC proper, the risks of impacting AMaCC are very low. Indeed Squint could be easily jettisoned with less than a minute of makefile and repo changes at any time without impact to AMaCC.

HPCguy commented 2 years ago

Hi. I have decided to create my own repo, since the turnaround time for review here is too long for my development needs. That said, I will leave Squint as a pull request in case you want to merge it into AMaCC. Feel free to cancel if you don't want it. As I do AMaCC bug fixes on my own repo, I will submit some of them as pull requests to this repo. What you've done with this compiler is really phenomenal, and I chose it vs Github competitors for that reason. Thank you for such an amazing compiler to work with, based mostly on C4 but going further just the right amount. Now it is my turn to split off as you did, and hopefully continue improving this amazing educational compiler. It will be in a new repo called Squint under HPCguy.

HPCguy commented 2 years ago

PS @loganchien Thank you for the time you took to review this. I know you sacrificed personal time, and I appreciate your careful analysis to make sure AMaCC would maintain its reputation before allowing the merge.

jserv commented 2 years ago

I wrote AMaCC in my spare time as one of volunteer projects. I appreciate the contributions from @HPCguy, @loganchien, and @lecopzer. Based on c4, AMaCC is now moving forward to be more practical and usable -- I could not imagine this when I started this project in 2016. Per the consideration of @loganchien, I think we can leave some messages in git log and merge the effort for experimental purpose. For future commits, more enhancements can be landed and targeting stronger Squint.

HPCguy commented 2 years ago

And thank you for your incredible work! I have started a new repo here: https://github.com/HPCguy/Squint . I encourage you to Read the README there. My mind does not deal well with outstanding pull requests, and I want to plow ahead rapidly on some new development. I will back-patch most bugfixes from the new repo into AMaCC, along with select features that I think are appropriate for AMaCC. I need the new repo for my mental health, so I can rapidly expand the capabilities of AMaCC without some constraints I have been putting on myself to stay true to the "small but powerful" AMaCC model. I think AMaCC provides a wonderful educational code base, and I don't want to mess that up.

HPCguy commented 2 years ago

I don't mind the suggested peephole checks being changed after this pull request, or any other changes anyone may want to add. I just won't be the one to do it, on philisophical grounds.

You have the option to discard this pull request with no ill feelings on my part. I suspected this might be too big for AMaCC when I started, which is why I was hesitant in my first post for this pull request.

In any event, I will continue to submit small bug fix pull requests to AMaCC that touch 20 or less lines. I will return to fixing my amacc "issue" submissions around March 14, after I have completed float data type support for this code base on my personal repo @ https://github.com/HPCguy/Squint.

HPCguy commented 2 years ago

The compiler is now beating gcc -O3 for the Sieve of Eratosthenes problem. Floating point is decently implemented and runs a physics benchmark in about 2/3 the time of Gcc -O0.

The https://github.com/HPCguy/Squint/blob/main/README.md page now has tables for executable size, compile time, and runtime comparisons.

After a few more performance enhanchements (I want floating point to beat gcc -O1), I will likely start fixing AMaCC bugs, and backporting those changes here.

HPCguy commented 2 years ago

The floating point execution performance of my extended version of amacc in the repository https://github.com/HPCguy/Squint is now beating gcc -O3 floating point performance.

HPCguy commented 2 years ago

I'm going to close this pull request, since Github HPCguy/Squint supersedes this work. AMaCC is mentioned liberally within that project, and it never would have been completed without the excellent development in the early AMaCC compiler.