windelbouwman / ppci

A compiler for ARM, X86, MSP430, xtensa and more implemented in pure Python
https://ppci.readthedocs.io/en/latest/
BSD 2-Clause "Simplified" License
336 stars 36 forks source link

Testing strategies #31

Open windelbouwman opened 4 years ago

windelbouwman commented 4 years ago

This post lists some interesting ideas about compiler testing: https://old.reddit.com/r/Python/comments/eieuld/c_compiler_written_in_python/

Csmith is an example: https://embed.cs.utah.edu/csmith/using.html

The other idea is hypothesis testing: https://hypothesis.works/

Work out some ideas about testing the compiler and document the different options at fuzzing / stress testing.

pfalcon commented 4 years ago

It should be noted that the existing approach to testing is described in the docs: https://ppci.readthedocs.io/en/latest/testing.html

And I'm +1 that testing should start like that: selecting real-world, cute, useful projects, then trying to build, then to run them. For that to be truly useful, such projects should have their own testsuite, so their running could be tested in automated manner. But for starters, even manual/trivial testing should be ok.

One thing I don't agree is things like:

To compile libmad, use the script tools/compile_libmad.py.

I appreciate that everything can't be done at once, and there could be temporary shortcuts. But that's all "below ground" approach. PPCI really should get to the surface of ground: the testing starts when one can do make CC=ppci-cc.

windelbouwman commented 4 years ago

And I'm +1 that testing should start like that: selecting real-world, cute, useful projects, then trying to build, then to run them. For that to be truly useful, such projects should have their own testsuite, so their running could be tested in automated manner. But for starters, even manual/trivial testing should be ok.

Do you have specific projects in mind? I selected libmad since it is small and simple enough, but still for running it, once would require the C stdlib, which might become troublesome.

One thing I don't agree is things like:

To compile libmad, use the script tools/compile_libmad.py.

I appreciate that everything can't be done at once, and there could be temporary shortcuts. But that's all "below ground" approach. PPCI really should get to the surface of ground: the testing starts when one can do make CC=ppci-cc.

Please keep in mind that this is indeed an intermediate step. I wrote some scripts to automate some manual attempts at compiling those projects. I suggest to keep the scripts and update the documentation with make CC=ppci-cc when that command is working.

pfalcon commented 4 years ago

Do you have specific projects in mind? I selected libmad since it is small and simple enough, but still for running it, once would require the C stdlib, which might become troublesome.

I don't. I think that libmad is a good choice, unless even it comes "as too big so far", which you seem to imply.

Then I wonder what's the state of C semantics and backend in general. If it's known/suspected to contain bugs, then I'd say starting with a unittests for trivial functions like strlen(), strcat(), strchr(), etc. might be a good idea (of course, better to find such than to write from scratch).

If there's no worry about trivial things not working, then next step up may be less trivial, but still standalone algos like CRC32, MD5, etc.

(And I'm writing all this based on the note in https://ppci.readthedocs.io/en/latest/contributing/testing.html#cc : "PPCI can also compile it, running it remains a challenge." I'm not sure what kind of challenge there's, is it again libc, or codegeneration bugs.)

I suggest to keep the scripts and update the documentation with make CC=ppci-cc when that command is working.

Sounds good, thanks for explanation!

tstreiff commented 4 years ago

Hello, Testing a compiler without a minimal working library is impossible, so I tried to make both at the same time, targeting x86_64 code. I first tried to compile a small portable C library, but I faced too many compilation problems/execution crashes so I took a different approach, Most libraries require indeed that everything works at once: stdio, heap, etc. whereas I needed to do it step by step. I therefore started from a very simple library I wrote years ago for 16bit processors with limited memory and tried to debug step by step. I tried to log each issue I faced and to fix the blocking issues to make progress.

The status is as follows:

Blocking issues I had to fix at once:

Serious issues I saw, but I could do without fixing by writing code in a different way:

Things I noticed but that are non functionally blocking

windelbouwman commented 4 years ago

Thanks for your detailed explanation! You're right that testing a compiler can be tricky.

Two approaches I have in mind:

windelbouwman commented 4 years ago

Refer here for hypothesis based testing: https://github.com/windelbouwman/ppci/blob/master/test/hypo/wasm.py

pfalcon commented 4 years ago

Testing a compiler without a minimal working library is impossible

Note that this is largely alleviated after the work of https://github.com/windelbouwman/ppci/pull/62. While the aim of having PPCI to be a standalone, self-contained compiler infrastructure remains, different areas should be largely decoupled now. So, if you'd like to work on testing the C compiler, you shouldn't be affected by the lack of C library as grave as before, as you can link with the host C library (using gcc/ld/clang/whatever).

pfalcon commented 4 years ago

the compiler generates IR many constant expressions, including mulitplication by 1, etc. which are reflected at the target level. I know that this could be optimized in later passes, but most are very easy to avoid. Casts between the same type are also generated (mostly from "ptr" to "ptr")

Why doing extra legwork to avoid it in a "compiler" (more specifically, IR code generator) for each and every source language (as PPCI supports many, and intended to support only more), instead of handling it once in the optimizer?

I'd suggest that vice-versa, we should target for simple, dumb, but correct IR code generation as the first stage, and independently from it, work on improving IR optimizer.

tstreiff commented 4 years ago

So, if you'd like to work on testing the C compiler, you shouldn't be affected by the lack of C library as grave as before, as you can link with the host C library (using gcc/ld/clang/whatever).

You are right, being able to link with modules compiled by another compiler is a big progress.

But the ABI used by both compilers must be the same. For x86_64, there are currently some differences. Differences I have seen in function calling conventions: 8byte ints in PPCI instead of 4 in the x86_64 ABI (but everything is 8byte-aligned on the stack), different methods for handling varargs.

The varargs issue is subtle because PPCI does it in a portable way (allocating a block on the stack and filling it with the .. args values) which should work on most architecture but the x86_64 ABI imposes a very specific (incompatible) way.

tstreiff commented 4 years ago

I'd suggest that vice-versa, we should target for simple, dumb, but correct IR code generation as the first stage, and independently from it, work on improving IR optimizer.

I agree. I had to avoid to use anything other than -O0 because one optimization pass (mem2reg I think) tend to produce invalid modules (with "undefined" values). With -O0, there is no IR transformation at all (which is anyway better for debugging a language front-end)

windelbouwman commented 4 years ago
* I nevertheless saw a few wrong extensions from 32 to 64 quantities, and wrong pointer testings in a few cases

I'm curious to the issues you saw here, I'm currently bug hunting the wasm to x86 code path, there is some issue there, which might be the issue you observed.

tstreiff commented 4 years ago

First remember I changed int size to 32bit to be compatible with the x86_64 ABI, so I was compiling with 32bit ints and 64bit pointers.

1) About the pointer testing issue, here is what I wrote down in my logbook:

FILE *stream; ...if (!stream)... generates:

   mov rax, [rbp, -8]  
   mov ebx, 0
   cmp eax, ebx           ??? 32bit comparison
   jz main_block2

but I have just checked that the IR is also wrong:

ptr load = load stream_addr i32 typecast = cast load ??? i32 zero = 0 cjmp typecast == zero ? .. : ..

Note that the problem does not occur if the pointer is compared to 0 and NULL.

Last minute: the cast bug comes from the semantics checking, in "on_if", there is a coercion of the condition to int. This works if ints are 64bit but not if there are 32bit. The C standard does not state that a conditon is of type int, but the condition is tested as if compared to 0, which also works for pointers. Same problem in on_while, on_do.

2) About the wrong 32=>64bit extension, it occurs when an int quantity is added from a pointer.

char *buffer;; int i; return buffer[i]; generates mov rbx, [rbp, -8] mov ecx, [rbp, -12] this is i (32bit) xor rax, rax mov eax, ecx ??? i is extended to 64bit with 0-extension mov rdx, 1 imul rax, rdx add rbx, rax mov al, [rbx]

It is required to do a signed extension because if i is negative, the quantity to add to buffer must also be negative. This bug stays hidden as long as i stays positive but if i goes negative, a memory fault is very likely (because instead of accessing 1byte backward we will access buffer - 0xfffffff. Here the IR is correct:

ptr load = load ptr_addr;
i32 load_0 = load i_addr;
ptr element_size = 1;
ptr index = cast load_0;
ptr element_offset = index * element_size;
ptr element_address = load + element_offset;
i8 load_1 = load element_address;

This is the code generated for the i32=>ptr cast which is wrong. When coming from a signed integer type, we must sign extend, when coming from an unsigned integer type, we must zero extend.

Last minute: I quickly checked in the x86_64 templates and there are 2 for 32=>64bit extension. The 1st one generates 0-extension and is used for U32TOU64, I32TOU64, U32TOI64 the 2nd one generates sign extension and is used for I32TOI64. Sign extension must be used for I32TOx64 Zero-extension must be used for U32T0x64 I moved one line and it works!

I hope this will help!

windelbouwman commented 4 years ago

Well, that's interesting! I always thought that only I32TOI64 required the sign extension, and the other 3 variant could be done with zero extend.

windelbouwman commented 4 years ago

My current attempt at coremark-wasi.wasm

Native version:

$ python -m ppci.cli.wabt run --target native coremark-wasi.wasm

...

2K performance run parameters for coremark.
[0]ERROR! list crc 0xbfe2 - should be 0xe714
[0]ERROR! state crc 0xf222 - should be 0x8e3a
CoreMark Size    : 666
Total ticks      : 13509
Total time (secs): 13.509000
Iterations/Sec   : 4441.483455
Iterations       : 60000
Compiler version : Clang 9.0.0 (https://github.com/llvm/llvm-project 0399d5a9682b3cef71c653373e38890c63c4c365)
Compiler flags   : -O3   -lrt
Memory location  : HEAP
seedcrc          : 0xe9f5
[0]crclist       : 0xbfe2
[0]crcmatrix     : 0x1fd7
[0]crcstate      : 0xf222
[0]crcfinal      : 0x1c7d
Errors detected

python version:

$ python -m ppci.cli.wabt run coremark-wasi.wasm

...

2K performance run parameters for coremark.
CoreMark Size    : 666
Total ticks      : 13263
Total time (secs): 13.263000
Iterations/Sec   : 1.507954
Iterations       : 20
Compiler version : Clang 9.0.0 (https://github.com/llvm/llvm-project 0399d5a9682b3cef71c653373e38890c63c4c365)
Compiler flags   : -O3   -lrt
Memory location  : HEAP
seedcrc          : 0xe9f5
[0]crclist       : 0xe714
[0]crcmatrix     : 0x1fd7
[0]crcstate      : 0x8e3a
[0]crcfinal      : 0x4983
Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 1.507954 / Clang 9.0.0 (https://github.com/llvm/llvm-project 0399d5a9682b3cef71c653373e38890c63c4c365) -O3   -lrt / HEAP

I hoped the crc mismatch was due to this sign extension, but alas, this was not the case :(

windelbouwman commented 4 years ago

Okay, the coremark benchmark now passes. I do not know exactly why though, I modified the register allocator to be more optimistic. Onto the next benchmark!

tstreiff commented 4 years ago

Good! So you compiled with the C compiler and the wasm code generator? I have just browsed the Coremark C code quickly. It uses many 16bit variables so on x86_64 it must shake the integer promotion logics...

I have seen a couple of functions that could be impacted by a bug on integer promotion: in core_list_join.c there are 2 functions returning an s32 computed by substracting 16bit int. In this case, the compiler makes a 16bit substraction and promote to 32bit, whereas it should promote to 32bit first and then do the substraction.

tstreiff commented 4 years ago

The C coremark benchmark passes with the x86_64 target (32bit int, 64bit pointer). I had to fight a bit:

Generation:

1st run did not crash but the result were not OK.

And after that success!

2K performance run parameters for coremark. CoreMark Size : 666 Total ticks : 15296 Total time (secs): 15 Iterations/Sec : 4000 Iterations : 60000 Compiler version : PPCI CC v0.57 Compiler flags : -O0 Memory location : HEAP seedcrc : 0xe9f5 [0]crclist : 0xe714 [0]crcmatrix : 0x1fd7 [0]crcstate : 0x8e3a [0]crcfinal : 0xbd59 Correct operation validated. See README.md for run and reporting ""rules.

windelbouwman commented 4 years ago

Very cool results! I will try to reproduce and compile coremark from C -> x86_64 as well. This is a useful benchmark to have both for correctness as well as for performance.

Interesting why the route C -> wasm -> x86_64 yields faster results than from C -> x86_64 with ppci. I compiled the coremark to wasm with clang via the wasienv toolchain as described in the wasm3 repository. This wasm file I ran with python -m ppci.cli.wabt run --target native.

windelbouwman commented 4 years ago

I'm making a script to compile coremark, could you share your changes you made to be able to compile it?

windelbouwman commented 4 years ago

The script to build coremark lives here: https://github.com/windelbouwman/ppci/blob/master/tools/compile_coremark.py

It is not yet fully working, since a call to clock_gettime is missing. Curious how you handled this dependency!

tstreiff commented 4 years ago

For clock_gettime() under Linux64, I just put this in my library:

define SYS_CLK_GETTIME 228

int clock_gettime(int clockid, struct timespec *tp) { long ret = syscall(SYS_CLK_GETTIME, clockid, tp, 0); if (ret < 0) { errno = -ret; return -1; } return 0; }

In core_portme.h, I have: HAS_FLOAT 0 HAS_TIME_H 1 USE_CLOCK 0 HAS_STDIO 1 HAS_PRINTF 1 I linked my clib, but I don't think the test makes heavy use of it (mainly printf, and clock_gettime)

For the building, the main issue was that the linker refuses to relocate an array of cstrings (there was an assert failure in U64DataRelocation.calc(), reporting that the 64bit address to be relocated must align4. This assert is wrong because when a cstring is relocated, its address has no alignment constraint.

assert sym_value % 4 == 0

I put it in comment, and the linker did its job.

This should be enough for getting an executable.

As mentioned previously, the x86 backend generated wrong code for >>= (the IR is OK) so you need to replace the 2 >>= in crcu8 by separate = and >>.

I am not sure that fixing the null-pointer comparison is necessary, the generated compares only the lower 32bit to 0 instead of the full 64bit, but it should work most of the time.

Note also that I have fixed other compiler bugs previously (mainly pointer arithmetic), so our compilers are not exactly in the same state, but these fixes could be unnecessary for the coremark to run successfully.

windelbouwman commented 4 years ago

Allright, this is insightful! Thanks for the heads up. I'm now to the point where I only miss malloc and free, since my home brew libc does not have it. Next I will hit the linker issue you described.

Note that patches to ppci are welcome, so please feel free to submit any pull request with fixes you made.

tstreiff commented 4 years ago

If you want to avoid the malloc/free calls, you can change the allocation method in "core_portme.h" Just replace:

define MEM_METHOD MEM_MALLOC

by

define MEM_METHOD MEM_STATIC

I have just tried it, it works and the test is successful as well on my side!

Another (longer) method, which only works because there is only one malloc and one free:

define SYS_BRK 12

// set new brk and returns old one, if NULL is passed, return current brk char sys_brk(void newbrk) { return syscall(SYS_BRK, newbrk, 0, 0); }

// increment data space by increment bytes and returns previous program break void sbrk(intptr_t increment) { char oldbrk = sys_brk(NULL); return sys_brk(oldbrk + increment); }

windelbouwman commented 4 years ago

Ah, good hints!

Results so far:

ERROR! Please define ee_u32 to a 32b unsigned type!
2K performance run parameters for coremark.
[%uu]ERROR! list crc 0x%004x - should be 0x%004x
[%uu]ERROR! matrix crc 0x%004x - should be 0x%004x
[%uu]ERROR! state crc 0x%004x - should be 0x%004x
ERROR: ee_s32 is not a 32b datatype!
ERROR: ee_u32 is not a 32b datatype!
ERROR: Please modify the datatypes in core_portme.h!
CoreMark Size    : %llu
Total ticks      : %llu
Total time (secs): %ff
Iterations/Sec   : %ff
Iterations       : %llu
Compiler version : %ss
Compiler flags   : %ss
Memory location  : %ss
seedcrc          : 0x%004x
[-1215408949048770560]crclist       : 0x%004x
[432345568239550464]crcmatrix     : 0x%004x
[100663296]crcstate      : 0x%004x
[-1215408947951894528]crcfinal      : 0x%004x
Errors detected

Pretty 'good' :P

Update:

windel@hoefnix tools]$ time ./coremark.elf 
ERROR! Please define ee_u32 to a 32b unsigned type!
2K performance run parameters for coremark.
[%uu]ERROR! list crc 0x%004x - should be 0x%004x
[%uu]ERROR! matrix crc 0x%004x - should be 0x%004x
[%uu]ERROR! state crc 0x%004x - should be 0x%004x
ERROR: ee_s32 is not a 32b datatype!
ERROR: ee_u32 is not a 32b datatype!
ERROR: Please modify the datatypes in core_portme.h!
CoreMark Size    : %llu
Total ticks      : %llu
Total time (secs): 10
Iterations/Sec   : 5938
Iterations       : %llu
Compiler version : ppci 0.5.8
Compiler flags   : -w0000t
Memory location  : Please put data memory location here
            (e.g. code in flash, data on heap etc)
seedcrc          : 0x%004x
[-1215408949048770560]crclist       : 0x%004x
[432345568239550464]crcmatrix     : 0x%004x
[100663296]crcstate      : 0x%004x
[-1215408947951894528]crcfinal      : 0x%004x
Errors detected

real    0m11.988s
user    0m11.981s
sys 0m0.000s
[windel@hoefnix tools]$ 
tstreiff commented 4 years ago

Coremark requires a 32bit integral type (being described as plain "int", "long int" or whatever, this may be adapted in core_portme.h) With the ppci-cc default C types mapping to IR types, shorts are 16bit, int and long are 64bit and there are no 32bit int. I had this issue very early when interfacing with the Linux64 kernel, because some system struct types contains 32bit fields (uid, gid, etc. are 32bit) and there is no easy way to fetch them.

The x86_64 ABI states that default ints are 32bit (even if this is most often passed in 64bit slots between functions) and by keeping with a 64bit int, I strongly suspect that interfacing with code compiled by other tools will be hard.

That's why I made a few changes to get 32bit ints, and 64bit long ints (like gcc) Changes ints to 32bit is easy ("int" is managed specially), but getting 64bit long is harder because the customization is not planned in the architecture (they are hard-coded to 32bit)

pfalcon commented 4 years ago

The x86_64 ABI states that default ints are 32bit (even if this is most often passed in 64bit slots between functions) and by keeping with a 64bit int, I strongly suspect that interfacing with code compiled by other tools will be hard.

More specifically, there're different "C native type models". For example, there's ILP32 model, where Integer, Long, and Pointer is 32-bit, and there's LP64 model, where Long and Pointers are 64-bit (and by exclusion, Ints are 32-bit).

These models then get mapped to architectures and their params (like ABIs). And of course, there're many more models, or more specifically, C types model should be (fully) parametrizable. E.g., there's nothing wrong with using ILP32 on x86_64. This saves on memory bloat, and gets all benefits of AMD64 (i.e. RISC-like architecture, except 2-address instead of 3-address). Indeed, that's known as x32.

@windelbouwman, that's one the suggestion I'd like to make - please avoid hardcoding any things, please make everything paramtrizable, and please actually allow to parametrize it. Like, literally, ppci-cc should accept a switching like -fdata-model= which takes a JSON file like:

{
"int": 32,
"long": 64,
"ptr": 32
}

And everything "just works" (in reasonable bounds of course).

That would be a way to make PPCI stand out - to show that there're enough, and easily accessible, knobs for experimentation and customization.

windelbouwman commented 4 years ago

Interesting stuff!

This wiki page gives a good overview of this https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models

Looks like the unix way (32 bits ints, 64 bits long) makes most sense?

pfalcon commented 4 years ago

Looks like the unix way (32 bits ints, 64 bits long) makes most sense?

LP64 seems to be the most popular model, and thus worth being default. My point is that default shouldn't mean hardcoded, and all that stuff should be (easily) configurable.

windelbouwman commented 4 years ago

Okay, yet another update:

[windel@hoefnix tools]$ time ./coremark.elf 
2K performance run parameters for coremark.
CoreMark Size    : %llu
Total ticks      : %llu
Total time (secs): 10
Iterations/Sec   : 3869
Iterations       : %llu
Compiler version : ppci 0.5.8
Compiler flags   : -w0000t
Memory location  : Please put data memory location here
            (e.g. code in flash, data on heap etc)
seedcrc          : 0x%004x
[-418119680]crclist       : 0x%004x
[534183936]crcmatrix     : 0x%004x
[-1908801536]crcstate      : 0x%004x
[632619008]crcfinal      : 0x%004x
Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 3869 / ppci 0.5.8 -w0000t / Static

real    0m13.223s
user    0m13.126s
sys 0m0.017s

The 32 bits stuff appears to work since commit e639b20f6a2229d029fc5271bd34059dceddfb37 The code for long and int was / is a bit cumbersome / crappy. I made it a bit better so those sizes can be specified by the backend.

@pfalcon fully agree selecting the datamodel should be parameterizable, with sensible defaults. To be continued! Maybe add a arch specific flag?

Strangely I did not face any issue with the >>=, since the correct operation was validated.. Maybe a bug was fixed in this code?

windelbouwman commented 4 years ago

Added hex support to home cooked printf for results:

[windel@hoefnix tools]$ time ./coremark.elf 
2K performance run parameters for coremark.
CoreMark Size    : 666
Total ticks      : 14825
Total time (secs): 15
Iterations/Sec   : 4047
Iterations       : 60000
Compiler version : ppci 0.5.8
Compiler flags   : -w0000t
Memory location  : Please put data memory location here
            (e.g. code in flash, data on heap etc)
seedcrc          : 0xe9f5
[0]crclist       : 0xe714
[0]crcmatrix     : 0x1fd7
[0]crcstate      : 0x8e3a
[0]crcfinal      : 0xbd59
Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 4047 / ppci 0.5.8 -w0000t / Static

real    0m17.577s
user    0m17.552s
sys 0m0.000s

For reference, the webassembly version is faster:

2K performance run parameters for coremark.
CoreMark Size    : 666
Total ticks      : 12897
Total time (secs): 12.897000
Iterations/Sec   : 4652.244708
Iterations       : 60000
Compiler version : GCCClang 9.0.0 (https://github.com/llvm/llvm-project 0399d5a9682b3cef71c653373e38890c63c4c365)
Compiler flags   : -O2   -lrt
Memory location  : Please put data memory location here
            (e.g. code in flash, data on heap etc)
seedcrc          : 0xe9f5
[0]crclist       : 0xe714
[0]crcmatrix     : 0x1fd7
[0]crcstate      : 0x8e3a
[0]crcfinal      : 0xbd59
Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 4652.244708 / GCCClang 9.0.0 (https://github.com/llvm/llvm-project 0399d5a9682b3cef71c653373e38890c63c4c365) -O2   -lrt / Heap

This is kind of weird, but I guess it is due to clang really creating a fast wasm program!

Update: baseline gcc run with coremark:

2K performance run parameters for coremark.
CoreMark Size    : 666
Total ticks      : 14744
Total time (secs): 14.744000
Iterations/Sec   : 20347.259902
Iterations       : 300000
Compiler version : GCC10.1.0
Compiler flags   : -O2 -DPERFORMANCE_RUN=1  -lrt
Memory location  : Please put data memory location here
            (e.g. code in flash, data on heap etc)
seedcrc          : 0xe9f5
[0]crclist       : 0xe714
[0]crcmatrix     : 0x1fd7
[0]crcstate      : 0x8e3a
[0]crcfinal      : 0xcc42
Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 20347.259902 / GCC10.1.0 -O2 -DPERFORMANCE_RUN=1  -lrt / Heap

We have a long way to go :)

Update: ppci with O2 optimizations:

2K performance run parameters for coremark.
CoreMark Size    : 666
Total ticks      : 12856
Total time (secs): 13
Iterations/Sec   : 4667
Iterations       : 60000
Compiler version : ppci 0.5.8
Compiler flags   : -w0000t
Memory location  : Please put data memory location here
            (e.g. code in flash, data on heap etc)
seedcrc          : 0xe9f5
[0]crclist       : 0xe714
[0]crcmatrix     : 0x1fd7
[0]crcstate      : 0x8e3a
[0]crcfinal      : 0xbd59
Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 4667 / ppci 0.5.8 -w0000t / Static
pfalcon commented 4 years ago

This is kind of weird, but I guess it is due to clang really creating a fast wasm program!

Surely, clang (or rather, LLVM) applies a lot of optimizations to wasm bytecode, the same way as it does to x86 "bytecode"?

tstreiff commented 4 years ago

Good!

The code for long and int was / is a bit cumbersome / crappy. I made it a bit better so those sizes can be specified by the backend.

In fact it was easy to choose a size for int and ptr, but there was no way to specify the size for other types.

Strangely I did not face any issue with the >>=, since the correct operation was validated. Maybe a bug was fixed in this code?

That's possible, shifts are tricky operations to generate right (especially right shifts)

About performance, comparison with GCC ppci-cc 0.5.7 (-O0) on my system: 15.388 sec gcc (default options) on my system: 12.911 sec.

windelbouwman commented 4 years ago

About performance, comparison with GCC ppci-cc 0.5.7 (-O0) on my system: 15.388 sec gcc (default options) on my system: 12.911 sec.

Note that you should compare the iter/seconds value, not the execution time of coremark. I added the result of coremark above.

tstreiff commented 4 years ago

I have just adapted my makefiles for the "upgraded" ppci-ld. I will be able to put my "crt0.o" in the library like other modules since the entry point symbol can now be specified.

Note that building a library and therefore using ppci-archive requires Python 3.7 (because of the "required" option in argparse)

tstreiff commented 4 years ago

I ran yesterday the small C test suite https://github.com/c-testsuite/c-testsuite

Current (10-jun-20) state (among the 220 tests):

windelbouwman commented 4 years ago

Cool stuff! That is a handy test suite.

Update: I made a test_c_test_suite.py to be able to run the 220 snippets. Not sure what you did to run it?

tstreiff commented 4 years ago

To run the whole test suite, I wrote a small shell script to compile/link/run/test all tests (linked with my clib, but most of them do not require more than "printf") The tests are written so that "main" returns 0 if execution is correct, so it is easy to validate: no output parsing is required.

I made progress on the 9 tests with execution failures:

About the 18 tests that cannot be compiled:

Is there a place where the current C front-end restrictions are listed?

I noticed the following:

tstreiff commented 4 years ago

We are making progress in executing correctly the small C test suite [https://github.com/c-testsuite/c-testsuite] thanks to all the fixes we made last week!

Current (10-jun-20) state (among the 220 tests): 186 passed (85%) 7 tests cannot be currently run : unsupported standard features or GCC extensions 18 compilation/link failures => working on these 9 execution failures (I suspect that issues #66 and #77 cover most of them) => working on these

Today status:

  • 197 passed (89%)
  • 7 tests cannot be currently run : unsupported standard features or GCC extensions Tests cases to be investigated:
  • 13 compilation/link failures
  • 2 execution failures (1 issue to be opened, the other test to be investigated)