leaningtech / cheerp-meta

Cheerp - a C/C++ compiler for Web applications - compiles to WebAssembly and JavaScript
https://labs.leaningtech.com/cheerp
Other
1.03k stars 51 forks source link

Cheerp / Emscripten size comparison #76

Closed kripken closed 5 years ago

kripken commented 6 years ago

Hi Cheerp devs :)

I see your website says "30% smaller than Emscripten" so I was curious to measure that. Running emscripten's tests/test_benchmark.py script, I see the following results which are very different:

That's for size. Speed-wise, the results are a mix (some a little faster, some a little slower, some about the same).

Several of the tests hit problems when running the Cheerp output:

I also couldn't get some tests to build in Cheerp: test_linpack, test_bullet, test_lua_binarytrees, test_lua_scimark, test_zlib. For example test_zlib says

Intrinsic name not mangled correctly for type arguments! Should be: llvm.cheerp.cast.user.p0struct._Z14internal_state.1.p0i8
%struct._Z14internal_state.1* (i8*)* @llvm.cheerp.cast.user.p0struct._Z14internal_state.p0i8

Maybe the script doesn't build them right? It calls Cheerp's llvm-ar etc., and the commands work with emscripten and native builds, but maybe something more needs to be done for Cheerp? Or am I hitting Cheerp limitations - would zlib, Lua, etc. need to be ported to Cheerp first?

This is on Cheerp 1516806243-1~xenial (which is the latest nightly build I see for Xenial, from Jan 24) and emscripten 1.37.36 (last tagged release, from Mar 13).

These results are very different from the ones you reported, perhaps we are not measuring the same thing somehow? (All the details of how I got the measurements mentioned above are in the linked script that runs those benchmarks, I basically just ran that script as-is except for uncommenting the line to enable running Cheerp.) Or maybe our results are about different versions?

alexp-sssup commented 6 years ago

Hello Alon. Sorry for the delay, we needed some time to reproduce results and gather all required data.

We have been pleasantly surprised that Cheerp is now integrated in emscripten's benchmarks. We have been maintaining our own branch of emscripten to run tests, which we have now rebased on 1.37.36 and released. You can find it here.

About size results, the differences arise from the test cases being slightly different. I will focus on test_primes and test_memops on this discussion, other benchmarks are available in the branch linked above.

To reproduce our results you need to simply remove printf statements from the test source. Both in test_primes and test_memops printf is used to report an invalid test size and to print the test results. In our modified version we remove the printf calls and directly return the numerical result of the test. As an example here is the source of memops with our changes:

      #include <stdio.h>
      #include <string.h>
      #include <stdlib.h>
      int main(int argc, char **argv) {
        int N, M;
        int arg = argc > 1 ? argv[1][0] - '0' : 3;
        switch(arg) {
          case 0: return 0; break;
          case 1: N = 1024*1024; M = 55; break;
          case 2: N = 1024*1024; M = 400; break;
          case 3: N = 1024*1024; M = 800; break;
          case 4: N = 1024*1024; M = 4000; break;
          case 5: N = 1024*1024; M = 8000; break;
          default: /*printf("error: %d\\n", arg);*/ return -1;
        }

        int final = 0;
        char *buf = (char*)malloc(N);
        for (int t = 0; t < M; t++) {
          for (int i = 0; i < N; i++)
            buf[i] = (i + final)%256;
          for (int i = 0; i < N; i++)
            final += buf[i] & 1;
          final = final % 1000;
        }
        //printf("final: %d.\\n", final);
        return final;
      }

Of course, we use the same modified sources when building with Emscripten and with Cheerp.

About larger scale tests, llvm-ar is not supported, we link libraries using llvm-link as documented here.

Various test cases need some patching to run with Cheerp. The branch we published contains all required patches. It should be noted that the tests were originally ported to be compiled to plain JS (Cheerp genericjs mode), so the patches could be heavily reduced if only the wasm target is of interest.

alexp-sssup commented 6 years ago

As a side note, the webMain code currently generated is invalid. We recommend changing it to something like this: https://bitbucket.org/apignotti/emscripten/commits/60620e0860099f20b8fe5855d7e6272ef4b14b6a?at=cheerp-fixes-2018mar-rebased

kripken commented 6 years ago

Thanks @alexp-sssup !

To reproduce our results you need to simply remove printf statements from the test source

I see, so we are indeed measuring something different.

Yes, good point, when using printf in a benchmark that is just a few lines of code like primes, probably most of the output code size is due to printf itself. printf is still interesting in a way (it is real-world C code), but maybe not that interesting in general.

However, without printf, output from tiny benchmarks like primes end up being dominated by the runtime overhead, which is also not that interesting in general (since people compiling just a few lines of code is pretty rare - still, I added a primes_nocheck benchmark to our suite to measure that).

Overall, I'm more interested in moderate or large code size projects (the common case that I see among users), like say Box2D. Do you see the same as what I reported on that one (emscripten being 23% smaller than cheerp)?

About larger scale tests, llvm-ar is not supported, we link libraries using llvm-link

I see. So configure/make like say zlib, lua, etc. benchmarks require won't work on cheerp, and I'd need to write a makefile manually if I want those tests to run?

As a side note, the webMain code currently generated is invalid. We recommend changing it to something like this: https://bitbucket.org/apignotti/emscripten/commits/60620e0860099f20b8fe5855d7e6272ef4b14b6a?at=cheerp-fixes-2018mar-rebased

That links requires me to log in, so I can't view it.

alexp-sssup commented 6 years ago

Like primes and memops our version of Box2D is patched. Many of the patches are there for genericjs type safety, but printf is also disabled. From my tests it seems that most of the size difference you measure comes from printf indeed. https://github.com/alexp-sssup/emscripten/commit/8b686069720902aa50ea5b14b69d3af7b4fd393e#diff-d6aa119c750aae03c61eea396a82d07b

With this patch, in my tests, the size between emscripten and cheerp becomes roughly the same. There is a ~2% difference either up or down depending if you choose the compressed or uncompressed version. As usual the patched version is used when compiling both the emscripten and cheerp builds.

configure/make should actually work, by using a wrapper script. This is documented here.

About the link, I pasted the one from our private repos instead of the public one. I apologize. Here is the correct one: https://github.com/alexp-sssup/emscripten/commit/60620e0860099f20b8fe5855d7e6272ef4b14b6a

kripken commented 6 years ago

Like primes and memops our version of Box2D is patched. Many of the patches are there for genericjs type safety, but printf is also disabled. From my tests it seems that most of the size difference you measure comes from printf indeed. alexp-sssup/emscripten@8b68606#diff-d6aa119c750aae03c61eea396a82d07b

Interesting, yes, I see that when I just remove the printf from box2d (using your patch

// Disable printing to stdout for Cheerp and Emscripten.
#define printf(fmt, ...) (0)

) then the cheerp and emscripten sizes become close.

But this seems odd. Why does cheerp go from 150K to 122K just by removing printf - is that expected?

Also, I'm not sure the benchmark is valid without the printing. Without printf, the LLVM optimizer may be able to remove code that we want to execute, but now has no side effects.

If printf is a problem for cheerp, is there some other way to print stuff, that is efficient for you?

configure/make should actually work, by using a wrapper script. This is documented here.

Thanks, but I still can't get it to work, though. First, --host=cheerp-unknown-none is a specific flag that I guess some projects support? But e.g. zlib (first I tried) does not. Second, even removing that flag, cheerpwrap doesn't help with the problem of the configure script emitting stuff that uses llvm-ar and other things that don't work in cheerp. (Does cheerpwrap do anything more than the emscripten benchmark runner already does, which is point CC, CXX to the various cheerp binaries?)

Anyhow, maybe I missed or misunderstood something there. In general, it would be great to have a shared script for these comparisons so we know and agree they are fair - perhaps you want to upstream some of the changes in your fork?

kripken commented 6 years ago

I found some time this weekend to dive into the box2d differences here in more detail.

A large source of differences is in system library code:

Overall, system lib differences account for a lot of the size differences between the compilers, but even though that's interesting to know, it's always going to be a tradeoff between compiling for size or speed - if one compiler started to build system libs with -Oz it might emit smaller code but eventually users would notice it isn't as fast, etc. So maybe this isn't that important.

Because of that I also did a dive into the wasm binaries themselves, looking function by function. I focused on the largest functions in box2d, which are

Looking at their binary sizes, Emscripten is smaller on all of them, by 18%, 10%, and 23% respectively.

Another way to look at that is to run the Binaryen optimizer on Cheerp output, and it shrinks it by 15%. That's pretty close to the per-function results, which makes sense if the two compiler's output is mostly similar, except that emscripten also runs the Binaryen optimizer.

To summarize,

alexp-sssup commented 6 years ago

Hello Alon, keep in mind that there has been significant changes in our Wasm backend since my last comment, so you will need to use updated packages to reproduce our exact results.

I will try to answer all the issue you raised.

Compiler With printf Without printf With console_log
Emscripten 48328 48076 N/A
Cheerp 57851 46270 46380
Box2D without -frtti Box2D with -frtti
45226 46380
kripken commented 6 years ago

Thanks for the detailed response!

the script has been changed to enable -frtti

Was that a typo perhaps, and you meant -fno-rtti? (Box2D doesn't need rtti or exceptions, so that's really how it should be built, and how game engines use it in practice. I updated the makefile in emscripten and opened an issue to update box2d.js as well.)

Updated box2d WASM size with and without printf:

Which emscripten version was that with? On the latest of both (Cheerp 1523865001-1, emscripten 1.37.37), here is what I see:

Compiler    With printf   Without printf
Emscripten     47089          40454
Cheerp         54965          46084

The Cheerp results are similar to yours, except a little better - maybe since I tested on a newer version. But your emscripten results without printf are surprisingly poor - maybe also part of the difference is I'm using a newer version, but I don't think we landed any major optimizations recently, so that is strange.

Aside from measuring size in bytes, I also gave more in-depth details above, that I don't think you responded to, curious to get your perspective on them, and to check if I got something wrong:

Is the implementation of printf shipped with Emscripten complete or is it simplified?

It's the musl libc printf implementation - should be complete AFAIK.

In our branch we have fixed zlib building with this patch

Thanks for the link. I'm conflicted on testing with patches like these, though: on one hand, more comparisons is good, but on the other, I want to test on real-world code, without special porting to emscripten or cheerp.

About the type safety patches that are needed for plain JS generation, are you open to dicuss integrating them as well?

Continuing my last response, I am open to code to run Cheerp with the right flags etc., and maybe minimal benchmark changes make sense (like removing printf), but I'd rather not modify zlib, bullet, box2d etc. significantly, since emscripten's goal is to run them well without porting (and the version in the test suite is used both for benchmarking and for testing).

Perhaps, instead, we could create a separate repo for cross-compiler comparisons?

Cheerp disables RTTI support by default, that is a deliberate choice.

I see, thanks. Makes sense now.

Cheerp uses newlib for its C library. The implementation of malloc is also dlmalloc

Interesting. Perhaps we use different versions of dlmalloc then, or build it differently - we use -O2, which flag do you use?

Additional test cases: We will investigate why Havlak is not working.

I see that fasta_float has been fixed in Cheerp recently, nice! Aside from Havlak, though, I still see base64 fail as mentioned above, and also Box2D without special changes, i.e. it fails in emscripten's box2d which is unmodified from upstream, but works in yours - is it expected that Cheerp's wasm support needs code to be ported for it to work?

alexp-sssup commented 6 years ago

I apologize for taking so long to reply. To answer with the appropiate level of detail and precision I needed to dedicate significant time, which I could not find until now.

#ifdef __CHEERP__
#include <cheerp/client.h>
inline clock_t clock()
{
        double t = cheerp::date_now();
        return (long long)(t*CLOCKS_PER_SEC/1000);
}
#endif
kripken commented 5 years ago

Oh sorry, I missed that there was a reply here...

I do still think these comparisons are useful, but as you said too, it's hard to find time given all the other priorities we have I guess.