Open windelbouwman opened 4 years ago
It should be noted that the existing approach to testing is described in the docs: https://ppci.readthedocs.io/en/latest/testing.html
And I'm +1 that testing should start like that: selecting real-world, cute, useful projects, then trying to build, then to run them. For that to be truly useful, such projects should have their own testsuite, so their running could be tested in automated manner. But for starters, even manual/trivial testing should be ok.
One thing I don't agree is things like:
To compile libmad, use the script tools/compile_libmad.py.
I appreciate that everything can't be done at once, and there could be temporary shortcuts. But that's all "below ground" approach. PPCI really should get to the surface of ground: the testing starts when one can do make CC=ppci-cc
.
And I'm +1 that testing should start like that: selecting real-world, cute, useful projects, then trying to build, then to run them. For that to be truly useful, such projects should have their own testsuite, so their running could be tested in automated manner. But for starters, even manual/trivial testing should be ok.
Do you have specific projects in mind? I selected libmad since it is small and simple enough, but still for running it, once would require the C stdlib, which might become troublesome.
One thing I don't agree is things like:
To compile libmad, use the script tools/compile_libmad.py.
I appreciate that everything can't be done at once, and there could be temporary shortcuts. But that's all "below ground" approach. PPCI really should get to the surface of ground: the testing starts when one can do
make CC=ppci-cc
.
Please keep in mind that this is indeed an intermediate step. I wrote some scripts to automate some manual attempts at compiling those projects. I suggest to keep the scripts and update the documentation with make CC=ppci-cc
when that command is working.
Do you have specific projects in mind? I selected libmad since it is small and simple enough, but still for running it, once would require the C stdlib, which might become troublesome.
I don't. I think that libmad is a good choice, unless even it comes "as too big so far", which you seem to imply.
Then I wonder what's the state of C semantics and backend in general. If it's known/suspected to contain bugs, then I'd say starting with a unittests for trivial functions like strlen(), strcat(), strchr(), etc. might be a good idea (of course, better to find such than to write from scratch).
If there's no worry about trivial things not working, then next step up may be less trivial, but still standalone algos like CRC32, MD5, etc.
(And I'm writing all this based on the note in https://ppci.readthedocs.io/en/latest/contributing/testing.html#cc : "PPCI can also compile it, running it remains a challenge." I'm not sure what kind of challenge there's, is it again libc, or codegeneration bugs.)
I suggest to keep the scripts and update the documentation with make CC=ppci-cc when that command is working.
Sounds good, thanks for explanation!
Hello, Testing a compiler without a minimal working library is impossible, so I tried to make both at the same time, targeting x86_64 code. I first tried to compile a small portable C library, but I faced too many compilation problems/execution crashes so I took a different approach, Most libraries require indeed that everything works at once: stdio, heap, etc. whereas I needed to do it step by step. I therefore started from a very simple library I wrote years ago for 16bit processors with limited memory and tried to debug step by step. I tried to log each issue I faced and to fix the blocking issues to make progress.
The status is as follows:
Blocking issues I had to fix at once:
forcing the compiler to use 4byte ints as in the x86_64 ABI, otherwise there is no way to access 32bit quantities in the Linux structs (like in stat). So ints are 32bit and longs are 64bit. The choice of what hardware types to use for each language type is spread over several files in a unexpected way. "int" are easy, but other types are more or less hard-coded.
this lead to problems in varargs management (alignment, etc.) but printf and scanf (and company) are now working fine (varargs management is not compatible with x86_64 ABI but it should not be a problem)
handling of void* (compiler crash when used)
the pointer arithmetics is not complete : eg. pointer - integer, pointer - pointer, pointer += integer, pointer -= integer are not processed correctly and therefore produce nice crashes when executing target code
handling of "extern" specifier
Serious issues I saw, but I could do without fixing by writing code in a different way:
failure to use static initialization of complex struct, the compiler works but the linker reports alignement problems (I replaced it by dynamic initialization)
most of preprocessor macro expansion mechanism works but the macros LINE and FILE report the location in the macro declaration instead of in the macro expansion, which makes assert hard to use (I have understood where the problem is but fixing is a bit tricky)
using -O2 is more or less impossible, errors "use of undefined" are raised most of the time
some (not too complex) functions failed to compile because the register allocator failed to find a proper allocation
using negative index for array (useful for pointer) generates wrong target code
using function pointers is hard, the compiler has problems between function types and pointer-to-function types
using functions returning struct is also difficult, the compiler adds an extra 1st argument to pass the adress where to copy the returned value. But this added argument confuses it in later phases.
Things I noticed but that are non functionally blocking
the type system of the compiler is too lax even for a C compiler: it should enforce at least C standard rules and warns when it makes assumptions (eg float f; f |= 1; is accepted);
there are some useless code generated in the IR, this implies useless code in target code (I noticed code produced to process result of functions whereas there is nothing like this in the source code)
the produced x86_64 is naive but works (it is at least easy to understand and therefore to debug). I nevertheless saw a few wrong extensions from 32 to 64 quantities, and wrong pointer testings in a few cases. The code is more or less "load/store" ie all memory accesses are made thru "mov" whereas the x86 allows reg<= reg op mem and even mem<=
the compiler generates IR many constant expressions, including mulitplication by 1, etc. which are reflected at the target level. I know that this could be optimized in later passes, but most are very easy to avoid. Casts between the same type are also generated (mostly from "ptr" to "ptr")
I still do not like the IR much... x86_64 machine code is more readable, but it is probably a matter of taste.
Thanks for your detailed explanation! You're right that testing a compiler can be tricky.
Two approaches I have in mind:
Refer here for hypothesis based testing: https://github.com/windelbouwman/ppci/blob/master/test/hypo/wasm.py
Testing a compiler without a minimal working library is impossible
Note that this is largely alleviated after the work of https://github.com/windelbouwman/ppci/pull/62. While the aim of having PPCI to be a standalone, self-contained compiler infrastructure remains, different areas should be largely decoupled now. So, if you'd like to work on testing the C compiler, you shouldn't be affected by the lack of C library as grave as before, as you can link with the host C library (using gcc/ld/clang/whatever).
the compiler generates IR many constant expressions, including mulitplication by 1, etc. which are reflected at the target level. I know that this could be optimized in later passes, but most are very easy to avoid. Casts between the same type are also generated (mostly from "ptr" to "ptr")
Why doing extra legwork to avoid it in a "compiler" (more specifically, IR code generator) for each and every source language (as PPCI supports many, and intended to support only more), instead of handling it once in the optimizer?
I'd suggest that vice-versa, we should target for simple, dumb, but correct IR code generation as the first stage, and independently from it, work on improving IR optimizer.
So, if you'd like to work on testing the C compiler, you shouldn't be affected by the lack of C library as grave as before, as you can link with the host C library (using gcc/ld/clang/whatever).
You are right, being able to link with modules compiled by another compiler is a big progress.
But the ABI used by both compilers must be the same. For x86_64, there are currently some differences. Differences I have seen in function calling conventions: 8byte ints in PPCI instead of 4 in the x86_64 ABI (but everything is 8byte-aligned on the stack), different methods for handling varargs.
The varargs issue is subtle because PPCI does it in a portable way (allocating a block on the stack and filling it with the .. args values) which should work on most architecture but the x86_64 ABI imposes a very specific (incompatible) way.
I'd suggest that vice-versa, we should target for simple, dumb, but correct IR code generation as the first stage, and independently from it, work on improving IR optimizer.
I agree. I had to avoid to use anything other than -O0 because one optimization pass (mem2reg I think) tend to produce invalid modules (with "undefined" values). With -O0, there is no IR transformation at all (which is anyway better for debugging a language front-end)
* I nevertheless saw a few wrong extensions from 32 to 64 quantities, and wrong pointer testings in a few cases
I'm curious to the issues you saw here, I'm currently bug hunting the wasm to x86 code path, there is some issue there, which might be the issue you observed.
First remember I changed int size to 32bit to be compatible with the x86_64 ABI, so I was compiling with 32bit ints and 64bit pointers.
1) About the pointer testing issue, here is what I wrote down in my logbook:
FILE *stream; ...if (!stream)... generates:
mov rax, [rbp, -8] mov ebx, 0 cmp eax, ebx ??? 32bit comparison jz main_block2
but I have just checked that the IR is also wrong:
ptr load = load stream_addr i32 typecast = cast load ??? i32 zero = 0 cjmp typecast == zero ? .. : ..
Note that the problem does not occur if the pointer is compared to 0 and NULL.
Last minute: the cast bug comes from the semantics checking, in "on_if", there is a coercion of the condition to int. This works if ints are 64bit but not if there are 32bit. The C standard does not state that a conditon is of type int, but the condition is tested as if compared to 0, which also works for pointers. Same problem in on_while, on_do.
2) About the wrong 32=>64bit extension, it occurs when an int quantity is added from a pointer.
char *buffer;; int i; return buffer[i]; generates mov rbx, [rbp, -8] mov ecx, [rbp, -12] this is i (32bit) xor rax, rax mov eax, ecx ??? i is extended to 64bit with 0-extension mov rdx, 1 imul rax, rdx add rbx, rax mov al, [rbx]
It is required to do a signed extension because if i is negative, the quantity to add to buffer must also be negative. This bug stays hidden as long as i stays positive but if i goes negative, a memory fault is very likely (because instead of accessing 1byte backward we will access buffer - 0xfffffff. Here the IR is correct:
ptr load = load ptr_addr; i32 load_0 = load i_addr; ptr element_size = 1; ptr index = cast load_0; ptr element_offset = index * element_size; ptr element_address = load + element_offset; i8 load_1 = load element_address;
This is the code generated for the i32=>ptr cast which is wrong. When coming from a signed integer type, we must sign extend, when coming from an unsigned integer type, we must zero extend.
Last minute: I quickly checked in the x86_64 templates and there are 2 for 32=>64bit extension. The 1st one generates 0-extension and is used for U32TOU64, I32TOU64, U32TOI64 the 2nd one generates sign extension and is used for I32TOI64. Sign extension must be used for I32TOx64 Zero-extension must be used for U32T0x64 I moved one line and it works!
I hope this will help!
Well, that's interesting! I always thought that only I32TOI64 required the sign extension, and the other 3 variant could be done with zero extend.
My current attempt at coremark-wasi.wasm
Native version:
$ python -m ppci.cli.wabt run --target native coremark-wasi.wasm
...
2K performance run parameters for coremark.
[0]ERROR! list crc 0xbfe2 - should be 0xe714
[0]ERROR! state crc 0xf222 - should be 0x8e3a
CoreMark Size : 666
Total ticks : 13509
Total time (secs): 13.509000
Iterations/Sec : 4441.483455
Iterations : 60000
Compiler version : Clang 9.0.0 (https://github.com/llvm/llvm-project 0399d5a9682b3cef71c653373e38890c63c4c365)
Compiler flags : -O3 -lrt
Memory location : HEAP
seedcrc : 0xe9f5
[0]crclist : 0xbfe2
[0]crcmatrix : 0x1fd7
[0]crcstate : 0xf222
[0]crcfinal : 0x1c7d
Errors detected
python version:
$ python -m ppci.cli.wabt run coremark-wasi.wasm
...
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 13263
Total time (secs): 13.263000
Iterations/Sec : 1.507954
Iterations : 20
Compiler version : Clang 9.0.0 (https://github.com/llvm/llvm-project 0399d5a9682b3cef71c653373e38890c63c4c365)
Compiler flags : -O3 -lrt
Memory location : HEAP
seedcrc : 0xe9f5
[0]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[0]crcfinal : 0x4983
Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 1.507954 / Clang 9.0.0 (https://github.com/llvm/llvm-project 0399d5a9682b3cef71c653373e38890c63c4c365) -O3 -lrt / HEAP
I hoped the crc mismatch was due to this sign extension, but alas, this was not the case :(
Okay, the coremark benchmark now passes. I do not know exactly why though, I modified the register allocator to be more optimistic. Onto the next benchmark!
Good! So you compiled with the C compiler and the wasm code generator? I have just browsed the Coremark C code quickly. It uses many 16bit variables so on x86_64 it must shake the integer promotion logics...
I have seen a couple of functions that could be impacted by a bug on integer promotion: in core_list_join.c there are 2 functions returning an s32 computed by substracting 16bit int. In this case, the compiler makes a 16bit substraction and promote to 32bit, whereas it should promote to 32bit first and then do the substraction.
The C coremark benchmark passes with the x86_64 target (32bit int, 64bit pointer). I had to fight a bit:
Generation:
1st run did not crash but the result were not OK.
And after that success!
2K performance run parameters for coremark. CoreMark Size : 666 Total ticks : 15296 Total time (secs): 15 Iterations/Sec : 4000 Iterations : 60000 Compiler version : PPCI CC v0.57 Compiler flags : -O0 Memory location : HEAP seedcrc : 0xe9f5 [0]crclist : 0xe714 [0]crcmatrix : 0x1fd7 [0]crcstate : 0x8e3a [0]crcfinal : 0xbd59 Correct operation validated. See README.md for run and reporting ""rules.
Very cool results! I will try to reproduce and compile coremark from C -> x86_64 as well. This is a useful benchmark to have both for correctness as well as for performance.
Interesting why the route C -> wasm -> x86_64 yields faster results than from C -> x86_64 with ppci. I compiled the coremark to wasm with clang via the wasienv toolchain as described in the wasm3 repository. This wasm file I ran with python -m ppci.cli.wabt run --target native
.
I'm making a script to compile coremark, could you share your changes you made to be able to compile it?
The script to build coremark lives here: https://github.com/windelbouwman/ppci/blob/master/tools/compile_coremark.py
It is not yet fully working, since a call to clock_gettime
is missing. Curious how you handled this dependency!
For clock_gettime() under Linux64, I just put this in my library:
define SYS_CLK_GETTIME 228
int clock_gettime(int clockid, struct timespec *tp) { long ret = syscall(SYS_CLK_GETTIME, clockid, tp, 0); if (ret < 0) { errno = -ret; return -1; } return 0; }
In core_portme.h, I have: HAS_FLOAT 0 HAS_TIME_H 1 USE_CLOCK 0 HAS_STDIO 1 HAS_PRINTF 1 I linked my clib, but I don't think the test makes heavy use of it (mainly printf, and clock_gettime)
For the building, the main issue was that the linker refuses to relocate an array of cstrings (there was an assert failure in U64DataRelocation.calc(), reporting that the 64bit address to be relocated must align4. This assert is wrong because when a cstring is relocated, its address has no alignment constraint.
assert sym_value % 4 == 0
I put it in comment, and the linker did its job.
This should be enough for getting an executable.
As mentioned previously, the x86 backend generated wrong code for >>= (the IR is OK) so you need to replace the 2 >>= in crcu8 by separate = and >>.
I am not sure that fixing the null-pointer comparison is necessary, the generated compares only the lower 32bit to 0 instead of the full 64bit, but it should work most of the time.
Note also that I have fixed other compiler bugs previously (mainly pointer arithmetic), so our compilers are not exactly in the same state, but these fixes could be unnecessary for the coremark to run successfully.
Allright, this is insightful! Thanks for the heads up. I'm now to the point where I only miss malloc and free, since my home brew libc does not have it. Next I will hit the linker issue you described.
Note that patches to ppci are welcome, so please feel free to submit any pull request with fixes you made.
If you want to avoid the malloc/free calls, you can change the allocation method in "core_portme.h" Just replace:
define MEM_METHOD MEM_MALLOC
by
define MEM_METHOD MEM_STATIC
I have just tried it, it works and the test is successful as well on my side!
Another (longer) method, which only works because there is only one malloc and one free:
define SYS_BRK 12
// set new brk and returns old one, if NULL is passed, return current brk char sys_brk(void newbrk) { return syscall(SYS_BRK, newbrk, 0, 0); }
// increment data space by increment bytes and returns previous program break void sbrk(intptr_t increment) { char oldbrk = sys_brk(NULL); return sys_brk(oldbrk + increment); }
Ah, good hints!
Results so far:
ERROR! Please define ee_u32 to a 32b unsigned type!
2K performance run parameters for coremark.
[%uu]ERROR! list crc 0x%004x - should be 0x%004x
[%uu]ERROR! matrix crc 0x%004x - should be 0x%004x
[%uu]ERROR! state crc 0x%004x - should be 0x%004x
ERROR: ee_s32 is not a 32b datatype!
ERROR: ee_u32 is not a 32b datatype!
ERROR: Please modify the datatypes in core_portme.h!
CoreMark Size : %llu
Total ticks : %llu
Total time (secs): %ff
Iterations/Sec : %ff
Iterations : %llu
Compiler version : %ss
Compiler flags : %ss
Memory location : %ss
seedcrc : 0x%004x
[-1215408949048770560]crclist : 0x%004x
[432345568239550464]crcmatrix : 0x%004x
[100663296]crcstate : 0x%004x
[-1215408947951894528]crcfinal : 0x%004x
Errors detected
Pretty 'good' :P
Update:
windel@hoefnix tools]$ time ./coremark.elf
ERROR! Please define ee_u32 to a 32b unsigned type!
2K performance run parameters for coremark.
[%uu]ERROR! list crc 0x%004x - should be 0x%004x
[%uu]ERROR! matrix crc 0x%004x - should be 0x%004x
[%uu]ERROR! state crc 0x%004x - should be 0x%004x
ERROR: ee_s32 is not a 32b datatype!
ERROR: ee_u32 is not a 32b datatype!
ERROR: Please modify the datatypes in core_portme.h!
CoreMark Size : %llu
Total ticks : %llu
Total time (secs): 10
Iterations/Sec : 5938
Iterations : %llu
Compiler version : ppci 0.5.8
Compiler flags : -w0000t
Memory location : Please put data memory location here
(e.g. code in flash, data on heap etc)
seedcrc : 0x%004x
[-1215408949048770560]crclist : 0x%004x
[432345568239550464]crcmatrix : 0x%004x
[100663296]crcstate : 0x%004x
[-1215408947951894528]crcfinal : 0x%004x
Errors detected
real 0m11.988s
user 0m11.981s
sys 0m0.000s
[windel@hoefnix tools]$
Coremark requires a 32bit integral type (being described as plain "int", "long int" or whatever, this may be adapted in core_portme.h) With the ppci-cc default C types mapping to IR types, shorts are 16bit, int and long are 64bit and there are no 32bit int. I had this issue very early when interfacing with the Linux64 kernel, because some system struct types contains 32bit fields (uid, gid, etc. are 32bit) and there is no easy way to fetch them.
The x86_64 ABI states that default ints are 32bit (even if this is most often passed in 64bit slots between functions) and by keeping with a 64bit int, I strongly suspect that interfacing with code compiled by other tools will be hard.
That's why I made a few changes to get 32bit ints, and 64bit long ints (like gcc) Changes ints to 32bit is easy ("int" is managed specially), but getting 64bit long is harder because the customization is not planned in the architecture (they are hard-coded to 32bit)
The x86_64 ABI states that default ints are 32bit (even if this is most often passed in 64bit slots between functions) and by keeping with a 64bit int, I strongly suspect that interfacing with code compiled by other tools will be hard.
More specifically, there're different "C native type models". For example, there's ILP32 model, where Integer, Long, and Pointer is 32-bit, and there's LP64 model, where Long and Pointers are 64-bit (and by exclusion, Ints are 32-bit).
These models then get mapped to architectures and their params (like ABIs). And of course, there're many more models, or more specifically, C types model should be (fully) parametrizable. E.g., there's nothing wrong with using ILP32 on x86_64. This saves on memory bloat, and gets all benefits of AMD64 (i.e. RISC-like architecture, except 2-address instead of 3-address). Indeed, that's known as x32.
@windelbouwman, that's one the suggestion I'd like to make - please avoid hardcoding any things, please make everything paramtrizable, and please actually allow to parametrize it. Like, literally, ppci-cc should accept a switching like -fdata-model=
which takes a JSON file like:
{
"int": 32,
"long": 64,
"ptr": 32
}
And everything "just works" (in reasonable bounds of course).
That would be a way to make PPCI stand out - to show that there're enough, and easily accessible, knobs for experimentation and customization.
Interesting stuff!
This wiki page gives a good overview of this https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models
Looks like the unix way (32 bits ints, 64 bits long) makes most sense?
Looks like the unix way (32 bits ints, 64 bits long) makes most sense?
LP64 seems to be the most popular model, and thus worth being default. My point is that default shouldn't mean hardcoded, and all that stuff should be (easily) configurable.
Okay, yet another update:
[windel@hoefnix tools]$ time ./coremark.elf
2K performance run parameters for coremark.
CoreMark Size : %llu
Total ticks : %llu
Total time (secs): 10
Iterations/Sec : 3869
Iterations : %llu
Compiler version : ppci 0.5.8
Compiler flags : -w0000t
Memory location : Please put data memory location here
(e.g. code in flash, data on heap etc)
seedcrc : 0x%004x
[-418119680]crclist : 0x%004x
[534183936]crcmatrix : 0x%004x
[-1908801536]crcstate : 0x%004x
[632619008]crcfinal : 0x%004x
Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 3869 / ppci 0.5.8 -w0000t / Static
real 0m13.223s
user 0m13.126s
sys 0m0.017s
The 32 bits stuff appears to work since commit e639b20f6a2229d029fc5271bd34059dceddfb37
The code for long
and int
was / is a bit cumbersome / crappy. I made it a bit better so those sizes can be specified by the backend.
@pfalcon fully agree selecting the datamodel should be parameterizable, with sensible defaults. To be continued! Maybe add a arch specific flag?
Strangely I did not face any issue with the >>=
, since the correct operation was validated.. Maybe a bug was fixed in this code?
Added hex support to home cooked printf for results:
[windel@hoefnix tools]$ time ./coremark.elf
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 14825
Total time (secs): 15
Iterations/Sec : 4047
Iterations : 60000
Compiler version : ppci 0.5.8
Compiler flags : -w0000t
Memory location : Please put data memory location here
(e.g. code in flash, data on heap etc)
seedcrc : 0xe9f5
[0]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[0]crcfinal : 0xbd59
Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 4047 / ppci 0.5.8 -w0000t / Static
real 0m17.577s
user 0m17.552s
sys 0m0.000s
For reference, the webassembly version is faster:
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 12897
Total time (secs): 12.897000
Iterations/Sec : 4652.244708
Iterations : 60000
Compiler version : GCCClang 9.0.0 (https://github.com/llvm/llvm-project 0399d5a9682b3cef71c653373e38890c63c4c365)
Compiler flags : -O2 -lrt
Memory location : Please put data memory location here
(e.g. code in flash, data on heap etc)
seedcrc : 0xe9f5
[0]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[0]crcfinal : 0xbd59
Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 4652.244708 / GCCClang 9.0.0 (https://github.com/llvm/llvm-project 0399d5a9682b3cef71c653373e38890c63c4c365) -O2 -lrt / Heap
This is kind of weird, but I guess it is due to clang really creating a fast wasm program!
Update: baseline gcc run with coremark:
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 14744
Total time (secs): 14.744000
Iterations/Sec : 20347.259902
Iterations : 300000
Compiler version : GCC10.1.0
Compiler flags : -O2 -DPERFORMANCE_RUN=1 -lrt
Memory location : Please put data memory location here
(e.g. code in flash, data on heap etc)
seedcrc : 0xe9f5
[0]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[0]crcfinal : 0xcc42
Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 20347.259902 / GCC10.1.0 -O2 -DPERFORMANCE_RUN=1 -lrt / Heap
We have a long way to go :)
Update: ppci with O2 optimizations:
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 12856
Total time (secs): 13
Iterations/Sec : 4667
Iterations : 60000
Compiler version : ppci 0.5.8
Compiler flags : -w0000t
Memory location : Please put data memory location here
(e.g. code in flash, data on heap etc)
seedcrc : 0xe9f5
[0]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[0]crcfinal : 0xbd59
Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 4667 / ppci 0.5.8 -w0000t / Static
This is kind of weird, but I guess it is due to clang really creating a fast wasm program!
Surely, clang (or rather, LLVM) applies a lot of optimizations to wasm bytecode, the same way as it does to x86 "bytecode"?
Good!
The code for long and int was / is a bit cumbersome / crappy. I made it a bit better so those sizes can be specified by the backend.
In fact it was easy to choose a size for int and ptr, but there was no way to specify the size for other types.
Strangely I did not face any issue with the >>=, since the correct operation was validated. Maybe a bug was fixed in this code?
That's possible, shifts are tricky operations to generate right (especially right shifts)
About performance, comparison with GCC ppci-cc 0.5.7 (-O0) on my system: 15.388 sec gcc (default options) on my system: 12.911 sec.
About performance, comparison with GCC ppci-cc 0.5.7 (-O0) on my system: 15.388 sec gcc (default options) on my system: 12.911 sec.
Note that you should compare the iter/seconds value, not the execution time of coremark. I added the result of coremark above.
I have just adapted my makefiles for the "upgraded" ppci-ld. I will be able to put my "crt0.o" in the library like other modules since the entry point symbol can now be specified.
Note that building a library and therefore using ppci-archive requires Python 3.7 (because of the "required" option in argparse)
I ran yesterday the small C test suite https://github.com/c-testsuite/c-testsuite
Current (10-jun-20) state (among the 220 tests):
Cool stuff! That is a handy test suite.
Update: I made a test_c_test_suite.py
to be able to run the 220 snippets. Not sure what you did to run it?
To run the whole test suite, I wrote a small shell script to compile/link/run/test all tests (linked with my clib, but most of them do not require more than "printf") The tests are written so that "main" returns 0 if execution is correct, so it is easy to validate: no output parsing is required.
I made progress on the 9 tests with execution failures:
About the 18 tests that cannot be compiled:
Is there a place where the current C front-end restrictions are listed?
I noticed the following:
We are making progress in executing correctly the small C test suite [https://github.com/c-testsuite/c-testsuite] thanks to all the fixes we made last week!
Current (10-jun-20) state (among the 220 tests): 186 passed (85%) 7 tests cannot be currently run : unsupported standard features or GCC extensions 18 compilation/link failures => working on these 9 execution failures (I suspect that issues #66 and #77 cover most of them) => working on these
Today status:
- 197 passed (89%)
- 7 tests cannot be currently run : unsupported standard features or GCC extensions Tests cases to be investigated:
- 13 compilation/link failures
- 2 execution failures (1 issue to be opened, the other test to be investigated)
This post lists some interesting ideas about compiler testing: https://old.reddit.com/r/Python/comments/eieuld/c_compiler_written_in_python/
Csmith is an example: https://embed.cs.utah.edu/csmith/using.html
The other idea is hypothesis testing: https://hypothesis.works/
Work out some ideas about testing the compiler and document the different options at fuzzing / stress testing.