vnmakarov / mir

A lightweight JIT compiler based on MIR (Medium Internal Representation) and C11 JIT compiler and interpreter based on MIR
MIT License
2.24k stars 147 forks source link

The Mir compiler failds to compile the benchmarks #282

Closed rempas closed 8 months ago

rempas commented 1 year ago

Ok, first of all! Great work with the benchmark script! It indeed finds and uses all of my installed C compiler! Good for thinking and implementing it! However, unfortunately, it seems that the mir compiler doesn't work. At least in the "bbv" and for the first 3 tests I tried (because I stopped it after that to post this issue).

Also, could you please add some info about what each column represents? The first one is the compiler but what about the other two? The second is a time but what? The time it took to run the test or the time to compile? In case it is runtime, please could you also count compile time and add another column (or two)? Also, in what? In seconds, milliseconds, microseconds? And the third a times counter but compared to what? As "GCC -O2" is always 1.00x (at least for the 3 tests that I run), I suppose that you initialize the "fastest" time in the GCC version so the other version will check how much of its performance they have (eg. 0.50x is 50% of the GCC performance).

But all these are just theories so please, have the first line explain the columns. Thank you!

rempas commented 1 year ago

Also, I think you should remove "GCC -O0" because I don't think that it really helps. Sure, it can be used for comparison to see the less optimized vs the most optimized version but people always care and compare the most optimized version so I think that "GCC -O0" just wastes space. But just a suggestion, I don't really mind on that one!

rempas commented 1 year ago

REEE, Wait! I wanted to read the script file and try to fix it myself and I found out that you use "./c2m" to run the mir bench. Why is that? There will be no local "c2m" executable in the "c-benchmarks" directory. It should be expected that the user installed it in the PATH and that it works.

So I "fixed it" (not really cause it wasn't broken, lol). I have also changed the tests a little bit to some other that make a little bit more sense? Do you want me to upload it? However, I haven't changed the columns and didn't added compile time benchmarks yet (because I don't know shell scripting and I couldn't understand a lot in the file...).

And finally... WHY (and) IS MIR FASTER THAN GCC AND CLANG (not always of course but still)???

I would wait for your reply first before I close this issue.

EDIT:

Well ok. Classic reaction. Running the whole test suite, seems that c2m -eg has in average 73% to 80% the performance of GCC or Clang that are used the following flags: -std=c11 -Ofast -finline -finline-functions -march=native -mtune=native -pipe. However, if I'm not wrong, "-Ofast" is consider unstable and unreliable so people don't use it in big and complex projects so there is no reason to beat or even reach its performance (unless we can do it slow and stable of course). Clang with -O2 (I forgot to uncomment the GCC version but it will have same performance wit Clang in average), has 91% to 95% of the best compiler settings. If I calculate correctly (probably not cause I SUCK at math), this means that Mir has about 82% to 85% of the "wanted" runtime performance. That's not bad! Not bad at all!!!! Of course more tests (especially heavy ones where Mir doesn't perform so good) would be nice so we have more things to compare in the future!

And I guess that Mir still has a lot of room for improvements as it is still a (relative) new project! At least in some case, it can be greatly improved (for example in the code generated in examples: 0, 9, 11, 12 and 13). But in general, your work is phenomenal!!!!

vnmakarov commented 1 year ago

Thank you for your feedback. The script prints wall time of running benchmarks compiled by different compilers. The second column is generated code performance relative to GCC -O2. I've just checked and found that benchmarks are working for me on bbv branch on x86-64 machine. But I should acknowledge that bbv branch can be unstable sometimes. Currently I use it for myself to experiment on my own JIT. I will start the branch stabilization after Sept to release MIR based on this branch at the end of year.

Benchmarking GCC -O0 is important for me too as I want to keep better compilation and generated code performance than GCC -O0. To be honest I am not satisfied with c2m and mir generator speed. Recently I sped up RA but mir generator speed is still small for my taste. The problem is that MIR as internal representation is not compact. Originally it was designed mostly to simplify API and implementation of code generation.

9th test (calls) is slow because all calls in MIR are currently done through thunks. The next release will have option to call generated function directly. It will improve 9th benchmarks.

Tests 0 (array), 11 (sieve), 12 (nbody), 13 (spectral norm) requires loop optimizations, better alias analysis, and/or better dealing with global variable addressing to be improved. I guess MIR generated code will look better if benchmarks are more memory bound. For example, I can change sieve to another variant (with bigger array) and mir generated code performance will achieve 90% of GCC -O2.

In any case optimization work is very time consuming and I am currently can not spend much time on it although this work is very interesting for me especially finding a good balance of generation speed, generation code performance, and simplicity of optimization implementation (which affects generator reliability as the more code you have, the more bugs is in the code).

rempas commented 1 year ago

Thanks for the fast reply!

I've just checked and found that benchmarks are working for me on bbv branch on x86-64 machine

But how? Is there a local "c2m" executable in the "c-benchmarks" directory? There isn't in my case. Is there something I'm missing?

But I should acknowledge that bbv branch can be unstable sometimes. Currently I use it for myself to experiment on my own JIT

Yeah, I remember that you said that somewhere else. However, I want to ask something very important. Is bbv unstable and buggy even in things and features that also exist in the master branch or do they have the same stability for the same features? I'm asking that because I would like to use bbv as nothing is fully stable anyway and as you're going to merge it. So when I find bugs in bbv, should I create issues?

Benchmarking GCC -O0 is important for me too as I want to keep better compilation and generated code performance than GCC -O0.

Oh, I see! I run benchmarks and the results were the following:

============AVERAGE:=========
c2m -eb:                                              0.82x
c2m -eg:                                              0.82x
gcc -O0:                                               0.50x
gcc (fastest):                                        1.00x
============GEOMEAN:=========
c2m -eb:                                              0.74x
c2m -eg:                                              0.74x
gcc -O0:                                               0.38x
gcc (fastest):                                        1.00x

My CPU is "Ryzen 5 2400G" and the flags for gcc (fastest) are the ones I mentioned in my issue (-Ofast, -inline, etc.). So amazing results!

To be honest I am not satisfied with c2m and mir generator speed. Recently I sped up RA but mir generator speed is still small for my taste.

From one point, I LOVE the way you say that and the fact that you have higher expectations but I want to ask another things. From the logical point of view, how possible is this? Like I said, in my benchmarks, mir is already at least 83% of runtime performance of "GCC -O2". So, I really wonder! Do you just have higher expectations anyways or do you think that you can practically get a good runtime performance boost without sacrificing a lot of compilation speed (so the trade is worth it)? That's an interesting topic for me as I don't know how neither mir as the mir binary format work internally so I don't know what's possible and what are the limitations. But you sure make me excited!

The problem is that MIR as internal representation is not compact. Originally it was designed mostly to simplify API and implementation of code generation.

I don't understand, what do you mean when saying "compact" and how would that help making things better? Also, have you thought about changing the design at something you prefer more? As mir hasn't hit version 1.0 yet, I thinking breaking compatibility is not something you should worry about.

9th test (calls) is slow because all calls in MIR are currently done through thunks. The next release will have option to call generated function directly. It will improve 9th benchmarks.

Why will it be an option and not be changed at all? Are there any drawbacks?

Tests 0 (array), 11 (sieve), 12 (nbody), 13 (spectral norm) requires loop optimizations, better alias analysis, and/or better dealing with global variable addressing to be improved. I guess MIR generated code will look better if benchmarks are more memory bound.

They way you make it seen is that if you implement these, mir will reach 100% of GCC's performance and that's just crazy! Well, there are cases where you can solve problems without recursion however there are cases where recursion is either necessary (is that truly? That's what I heard. Never found any in my experience) or it makes things much much easier. So I suppose "loop optimizations" are something you may be interested. Not sure about the other two and how much the trade off between the compile times and runtime performance will worth it but in the end, you are the creator and you know better than me! And anyways, mir now compiles at least 5 times faster than GCC -O2 in a project that I tested out so its amazingly fast and losing some speed won't hurt! Tbh, even if it was 2 times faster I would still be happy!

In then end, I'm saying again that I find your work phenomenal! I like that you have high expectations for your project (that's how I think for mine as well so I love that way of thinking) and that you are interested in improving it. In any case, have fun and don't overdo it (I mean working)! You shouldn't be in hurry. I think, if you want to take put into priority, try to fix as many bugs as possible and then try to implement inline assembly (with a system that allow us to use variables and its not annoying like GCC's one) or at least an insns that allow us to do system calls. The second one will probably be much much easier and its more important tho. But in any case, optimizations can wait in my humble opinion! But again, you know better in the end!