Closed HPCguy closed 2 years ago
Hi, I do not have a cross compile or QEMU environment, so I likely can't test or modify the make scripts without breaking something. Can you add that?
I will review all comments tomorrow before making changes. Have a good day. It is 3:30AM here so I need some sleep.
One more thing -- now that any library call is supported, it is possible to let the user use the -p option in place of the -o option, and then have a system call at the end of main() in amacc.c : if (peep) system("squint
This version of squint versus baseline:
feature | amacc.s | amacc.s.opt |
---|---|---|
ldr | 8091 | 4717 |
str | 6732 | 3043 |
reg-reg | 9654 | 5673 |
sub | 1878 | 693 |
add | 1567 | 1170 |
mul | 60 | 16 |
Hi, I do not have a cross compile or QEMU environment, so I likely can't test or modify the make scripts without breaking something. Can you add that?
No problem at all. You can modify Makefile
to reflect squint
specific changes. Then, once make check
includes the necessary checks, the CI can validate itself.
I created a new benchmarking test (Sieve of Eratosthenes), and I am considering pulling minimal register renaming from my non-published version of squint, hence the delay. I'll have finished that decision process by tonight, and will commit by tomorrow. It would be nice to be at least as fast as gcc -O0 with a minimal set of peephole changes.
@lecopzer, Please review this effort on introducing a peephole optimizer to AMaCC.
After a quick review, I'd like to see a doc like https://github.com/jserv/amacc/blob/master/docs/IR.md because it seems a complex algorithm.
Does this peephole algorithm made by yourself or any referenced paper or exist algorithm?
It looks like Squint inserts some hint (nop
) into codegen and resolve in squint.c.
But we don't have a brief concept of how the algorithm does and make it hard to review.
for example, in squint.c, how the map was constructed and the usage, what the meaning of ambiguous function naming such as simplifyBranchX
apply_peepholesX
X=1~4, what the NOP
use for...etc.
Everything in here was created by me except for popcount, which is a widely published algorithm, so not really attributable to anyone. popcount is "open literature".
AMaCC is not documented, not even subsets such as the ELF driver, the stack based AST, the type system, or anything other than the IR. I can write documentation when I get around to it, I just don't think it is fair to require it for this commit.
There are comments at the top of every function desribing what that function covers.
Final comment -- AMaCC itself is extremely compilcated. The fact that amacc itself can be optimized by Squintis a testament to its robustness. Also, you can compare the assembly language of any program produced with the original AMaCC (cf scripts/disasm) compared with the peephole optimized version of that executable as a testament to the quality of the optimizations.
Eating some crow (idiom) now, I found a problem in the frame-register optimizer I just added since yesterday. If I can't quickly resolve it, I will remove the frame-register optimizer.
OK. I simplified the frame-register optimization and re-commited.
Best time of sieve benchmark for 5 runs: amacc (no squint): 0m7.643s amacc (squint): 0m2.312s Gcc -O0: 0m2.191s
I will commit the Makefile tomorrow.
@lecopzer, looking forward to ELF documentation.
Best time of sieve benchmark for 5 runs:
compilers | execution time |
---|---|
amacc (no squint) | 0m7.643s |
amacc (squint) | 0m2.312s |
gcc -O0 | 0m2.191s |
I am curious about the conditions when AMaCC + Squint can be approaching gcc -O1
.
"or example, in squint.c, how the map was constructed and the usage, what the meaning of ambiguous function naming such as simplifyBranchX apply_peepholesX X=1~4, what the NOP use for...etc."
As I explain in the documentation, the order that functions are applied or how many times they are applied improves optimization. The generic names mean that anything could be placed in the function blocks in the future, and each function just provides a scope that can be put it a different order or repeated multiple times.
I am curious about the conditions when AMaCC + Squint can be approaching gcc -O1.
So far, just fib is faster than gcc -01. I have a function called simplify_frame() that gets rid of fp, so that only lr gets pushed with a function call, and the stack related logic gets modified/eliminated. Fib is all data movement on the stack frame via function calls, so it ends up being faster than gcc -O1 with those modifications.
That said, there is also a ton of use-def analysis that can be done to cut at least 1/3 of the instructions in many loops out, and to better balance pipelines. For instance, there are two load pipelines and two integer pipelines. Keeping them all going at full tilt can be done by balancing LEA location (explicit integer or CISC addressing) and balancing immediate constants vs grabbing them from pc-relative addresses.
Getting performance is not hard. What is hard is balancing it against code size when the philosophy of AMaCC is "small but powerful."
OK. I tried to create testing for the peephole capability. I can't really do anything other than modify text (I can't run amacc make in my environment), so I've probably missed some detail. I could really, really use some help on this.
OK. I tried to create testing for the peephole capability. I can't really do anything other than modify text (I can't run amacc make in my environment), so I've probably missed some detail. I could really, really use some help on this.
CI reported:
SelfCC elf/amacc
SelfCC elf/amacc-opt
make: *** [Makefile:59: elf/amacc-opt] Error 1
On my development system, I locally changed the QEMU and CROSS_COMPILE macros to work with my environment. When I do so, using the Makefile I have currently committed, everything passes as shown below. I don't understand why it is not working when I push to your repository. I have done everything I can on my end. I cannot debug the QEMU.
$:~/amacc $ make check
CC+LD squint
SelfCC elf/amacc
SelfCC elf/amacc-opt
Please remember the amacc '-p' compiler flag is a prerequisite.
executable file elf/amacc-opt was optimized.
[* verify tests/fib.c
On my development system, I locally changed the QEMU and CROSS_COMPILE macros to work with my environment. When I do so, using the Makefile I have currently committed, everything passes as shown below. I don't understand why it is not working when I push to your repository. I have done everything I can on my end. I cannot debug the QEMU.
One-line change makes it work!
--- a/Makefile
+++ b/Makefile
@@ -56,7 +56,7 @@ $(OBJ_DIR)/$(BIN): $(BIN)
$(OBJ_DIR)/$(BIN)-opt: $(BIN) $(PEEP)
$(VECHO) " SelfCC\t$@\n"
$(Q)$(ARM_EXEC) ./$< -p -o $@ $(BIN).c
- $(Q)$(ARM_EXEC) scripts/peep $@
+ $(Q)scripts/peep $@
SHELL_HACK := $(shell mkdir -p $(OBJ_DIR))
$(TEST_DIR)/%.o: $(TEST_DIR)/%.c $(BIN) $(OBJ_DIR)/$(BIN) $(OBJ_DIR)/$(BIN)-opt
And
--- a/scripts/peep
+++ b/scripts/peep
@@ -8,5 +8,5 @@ fi
PPATH="`dirname \"$0\"`"
if [[ ! -x $PPATH/../squint ]] ;
then echo "A 'squint' executable must exist.";
- else $PPATH/../squint $1 `readelf -a $1 2>/dev/null | awk '/\.text.*PROGBITS/ { print $6, $7 }'`;
+ else $ARM_EXEC $PPATH/../squint $1 `readelf -a $1 2>/dev/null | awk '/\.text.*PROGBITS/ { print $6, $7 }'`;
fi
Thank you so much, guys! I will make these changes, improve the testsing a tiny bit more, and check this in. I will also examine the documentation and squint.c one more time for spacing and style consistency after making your suggested changes. I will commit some time tomorrow morning, my time here in the US. Thank you!
PS I found a subtle logic error in simplifyBranch4 that I will fix. It usually works as written, but can fail for certain amacc generated sequences. It was working as written only because I got lucky.
Everything in here was created by me except for popcount, which is a widely published algorithm, so not really attributable to anyone. popcount is "open literature".
AMaCC is not documented, not even subsets such as the ELF driver, the stack based AST, the type system, or anything other than the IR. I can write documentation when I get around to it, I just don't think it is fair to require it for this commit.
There are comments at the top of every function desribing what that function covers.
I think that is the problem AMaCC has to be improved. We have some materials for analyzing AMaCC in Chinese, but they never push back to docs/XXX.md. @jserv do we plan to add such docs for each subsystem?
Also, actually I only want a brief concept and key variables usage or steps for the Squint, which could be added in commit message instead of the result only. It's my bad that I didn't describe clearly. It'd be better to add a .md but I think it's not necessary right now because as you said, AST/ELF don't have such file and it's not fair.
and thanks for your docs, I'll start to review it.
@jserv I tried the makefile change, but something seems to have gone wrong. I have to go to a party with my wife right now, but will check here when I get back.
Thanks again guys, for all your help.
We have some materials for analyzing AMaCC in Chinese, but they never push back to docs/XXX.md. @jserv do we plan to add such docs for each subsystem?
Yes, I did plan to improve the documentation while preparing the lecture materials. Feel free to submit the pull requests for demystifying AMaCC.
@jserv I tried the makefile change, but something seems to have gone wrong. I have to go to a party with my wife right now, but will check here when I get back.
At present, make check
is successful. Can you address the issues you found?
As far as I know, I have resolved all issues I found. The simplify_branch4 fix I had mentioned was actually an issue with create_pushpop_map, which I have fixed on squint.c line 1388 of the current commit. There is a very specific 4 line sequence of instructions that AMaCC generates (on lines 1564, 1657-1665) that this fix targets, that I believe can't occur otherwise in AMaCC generated code. The logic in simplify_branch4 is tricky, and I had fooled myself into thinking there was a problem there, when there was not.
I would also like to improve the testing of amacc-opt, but I would like to do that in an isolated commit unrelated to this pull request, where I have more time to get the tests right without breaking anything, if that is ok. The current test is pretty good, just not perfect.
And most importantly, thank you for fixing the make check error.
Final note for this pull request -- Please do not feel obligated to take squint if it is not sensible to include with AMaCC. As I mentioned in the first post in this conversation, I do not want you to put squint here unless you feel it would make a sensible contribution in the context of this repo. I am happy enough that AMaCC is available on the web to work with.
@loganchien Thank you for sharing your review timeframe. When I see a message from you that your review is complete, I will aggregate all requested changes in one commit.
I am thinking of the feasibility to change option -p
to -O
which indicates the generation of peephole optimization aware code. It would be a bit similar to Link-Time Optimization. Later, when we add more optimizers, the option -O
can be acting the role to classify and pass through.
@jserv To clarify, do you want 'amacc -O -o foo foo.c'? I am worried that people will forget the squint step, and believe that the '-O' is applying the optimization directly on the executable, since that is standard compiler convention. I have found that people do not pay close attention when trying to use tools. Using -Op from the start may add clarity.
@jserv To clarify, do you want 'amacc -O -o foo foo.c'? I am worried that people will forget the squint step, and believe that the '-O' is applying the optimization directly on the executable, since that is standard compiler convention. I have found that people do not pay close attention when trying to use tools. Using -Op from the start may add clarity.
I agree with the -Op
option. It would be more consistent with peephole optimizer.
FYI I just modified squint for fun to run as a shared object library. Here is a jit() run:
$ gcc -o amacc amacc.c -Wl,-rpath=/home/pi/amacc -ldl libsquint.so
$ time ./amacc tests/sieve.c
real 0m7.786s
user 0m7.774s
sys 0m0.011s
$ time ./amacc -Op tests/sieve.c
real 0m2.384s
user 0m2.354s
sys 0m0.022s
diff --git a/amacc.c b/amacc.c
index ee21887..5e9185c 100644
--- a/amacc.c
+++ b/amacc.c
@@ -22,6 +22,7 @@
#include <sys/stat.h>
#include <fcntl.h>
#include <dlfcn.h>
+#include "squint.h"
/* 64-bit host support */
#if defined(__x86_64__) || defined(__aarch64__)
@@ -1831,6 +1832,8 @@ int jit(int poolsz, int *main, int argc, char **argv)
if (je >= jitmap) die("jitmem too small");
*tje = reloc_bl(jitmap[((int) main - (int) text) >> 2] - (int) tje);
+ if (peephole) peephole_opt((int *)jitmem, je-1);
+
// hack to jump into specific function pointer
__clear_cache(jitmem, je);
int *res = bsearch(&sym, sym, 1, 1, (void *) _start);
I can configure the makefile to build as standalone or shared library if you are interested. If -Op
is enabled I can add libsquint.so
to the ELF .so
list. (just changed everything to -Op
for what it will look like on next commit).
FYI I just modified squint for fun to run as a shared object library. Here is a jit() run: [...] I can configure the makefile to build as standalone or shared library if you are interested. If
-Op
is enabled I can addlibsquint.so
to the ELF.so
list. (just changed everything to-Op
for what it will look like on next commit).
I like the idea to run Squint as a shared library, which means that AMaCC can compile itself while Squint can be built with optimizing compilers such as gcc and clang. Can you use dlopen
and dlsym
to load libsquint.so
?
After you resolve the issues found by @loganchien, I am about to merge this effort.
I have resolved all outstanding issues I am aware of.
Let me be blunt. It has been 14 days since this pull request has been essentially ready to go, and 20 days total in a pull request state. At several points, I asked if Squint was too much for AMaCC to accept, and you replied that you thought it was appropriate. The one bug that @loganchien found is one which will never appear in actual code, whereas AMaCC itself is riddled with bugs when trying to write the simplest of code fragments. I have fixed many such bugs since I began contributing to this project six months ago that are now a part of AMaCC. I find it disrespectful that I have had this pull request open for 20 days. You need to accept this pull request by tomorrow or I close it, and continue with AMaCC in my own repository on Github. While I am upset in this comment, I think all my points are both true and reasonable. I think AMaCC is a great project, and I would like to continue contributing here rather than on my own, but I have to trade that against being held back from being able to do good work in a timely fashion.
I understand your frustration for the long review time and I feel sorry about it. But I must clarify that I do not intend to be disrespectful. I am happy to see the change and willing to see it to be merged eventually.
The prolonged review time is the direct result of two reasons: (1) the magnitude of squint.cc
and (2) the limited spare time I have. To review the code properly, I have to think thoroughly on (1) what invariants are you trying to maintain in the program, (2) are all boundary cases properly handled, (3) what can go wrong, and (4) sometimes I have to check the ARM reference manual to ensure the assumption is correct (even if they are just two line of code like Line 688-689). All of these are time consuming work and unfortunately I only have roughly 8 hours per week for this work (and might be even shorter if some unexpected over-time work or house keeping work is taking away my time).
I think I won't be able to review it at a faster speed. I leave it to jserv@ to decide whether he wants to take this PR as-is and maybe fix any bugs found later. I am only trying to be helpful here. Thank you for your understanding.
I would feel it as a responsibility to fix any bugs as they arise. Since this is an auxillary program and not a part of AMaCC proper, the risks of impacting AMaCC are very low. Indeed Squint could be easily jettisoned with less than a minute of makefile and repo changes at any time without impact to AMaCC.
Hi. I have decided to create my own repo, since the turnaround time for review here is too long for my development needs. That said, I will leave Squint as a pull request in case you want to merge it into AMaCC. Feel free to cancel if you don't want it. As I do AMaCC bug fixes on my own repo, I will submit some of them as pull requests to this repo. What you've done with this compiler is really phenomenal, and I chose it vs Github competitors for that reason. Thank you for such an amazing compiler to work with, based mostly on C4 but going further just the right amount. Now it is my turn to split off as you did, and hopefully continue improving this amazing educational compiler. It will be in a new repo called Squint under HPCguy.
PS @loganchien Thank you for the time you took to review this. I know you sacrificed personal time, and I appreciate your careful analysis to make sure AMaCC would maintain its reputation before allowing the merge.
I wrote AMaCC in my spare time as one of volunteer projects. I appreciate the contributions from @HPCguy, @loganchien, and @lecopzer. Based on c4, AMaCC is now moving forward to be more practical and usable -- I could not imagine this when I started this project in 2016. Per the consideration of @loganchien, I think we can leave some messages in git log and merge the effort for experimental purpose. For future commits, more enhancements can be landed and targeting stronger Squint.
And thank you for your incredible work! I have started a new repo here: https://github.com/HPCguy/Squint . I encourage you to Read the README there. My mind does not deal well with outstanding pull requests, and I want to plow ahead rapidly on some new development. I will back-patch most bugfixes from the new repo into AMaCC, along with select features that I think are appropriate for AMaCC. I need the new repo for my mental health, so I can rapidly expand the capabilities of AMaCC without some constraints I have been putting on myself to stay true to the "small but powerful" AMaCC model. I think AMaCC provides a wonderful educational code base, and I don't want to mess that up.
I don't mind the suggested peephole checks being changed after this pull request, or any other changes anyone may want to add. I just won't be the one to do it, on philisophical grounds.
You have the option to discard this pull request with no ill feelings on my part. I suspected this might be too big for AMaCC when I started, which is why I was hesitant in my first post for this pull request.
In any event, I will continue to submit small bug fix pull requests to AMaCC that touch 20 or less lines. I will return to fixing my amacc "issue" submissions around March 14, after I have completed float data type support for this code base on my personal repo @ https://github.com/HPCguy/Squint.
The compiler is now beating gcc -O3 for the Sieve of Eratosthenes problem. Floating point is decently implemented and runs a physics benchmark in about 2/3 the time of Gcc -O0.
The https://github.com/HPCguy/Squint/blob/main/README.md page now has tables for executable size, compile time, and runtime comparisons.
After a few more performance enhanchements (I want floating point to beat gcc -O1), I will likely start fixing AMaCC bugs, and backporting those changes here.
The floating point execution performance of my extended version of amacc in the repository https://github.com/HPCguy/Squint is now beating gcc -O3 floating point performance.
I'm going to close this pull request, since Github HPCguy/Squint supersedes this work. AMaCC is mentioned liberally within that project, and it never would have been completed without the excellent development in the early AMaCC compiler.
This pull request is a starting point for discussion on how to integrate Squint most effectively with amacc. It may best be done with separate repos that are tied through modules, but I don't know.
I added a -p option to amacc.c that must currently be used with the amacc -o option to produce an ELF file that can be optimized with scripts/peep. In order for peep to work, a version of squint must be compiled, e.g. gcc -o squint squint.c .
Example:
% gcc -o amacc-gcc amacc.c -ldl % ./amacc-gcc -p -o amacc amacc.c % scripts/disasm amacc > amacc.s % gcc -o squint squint.c % scripts/peep amacc executable file amacc was optimized. % scripts/disasm amacc > amacc.s.opt % sdiff -W -d -w 180 amacc.s amacc.s.opt
I have more advanced versions of this that has even beat gcc -O1 for fib.c and possibly other things, but I suspect you will not like my coding style, so we will have to discuss.