Closed ChengCat closed 5 years ago
Thanks for this: I had noticed some of these problems when working on
some recent updates to support newer versions of LLVM, but forgot to
go back and fix them afterwards. The basic arithmetical/relational
functions are now always inlined, and for the example program
(a.dt
), O1
or higher will cause it to be optimised into the
printing of a constant.
There was an additional problem with those basic functions being retained in the output even when they weren't needed, which has now been fixed.
Hi, I have just tried to help testing the fixes. Dale now works beautifully with that example. Thanks!
I noticed two minor issues.
; in a.dt
(import cstdio)
(import macros)
(import b)
(def m (macro extern (void)
(def sum (var auto \ 0))
(for (i \ 0) (< i 500000000) (incv i)
(setv sum (+ sum i)))
(std.macros.mnfv mc sum)))
(def f (fn extern int (void)
(def sum (var auto \ 0))
(for (i \ 0) (< i 500000000) (incv i)
(setv sum (+ sum i)))
sum))
(def main (fn extern-c int (void)
(printf "%d\n" (m))
(printf "%d\n" (f))
(printf "%d\n" (b.m))
(printf "%d\n" (b.f))))
; in b.dt
(module b)
(import stdlib)
(import macros)
(namespace b
(def m (macro extern (void)
(def sum (var auto \ 0))
(for (i \ 0) (< i 500000000) (incv i)
(setv sum (+ sum i)))
(std.macros.mnfv mc sum)))
(def f (fn extern int (void)
(def sum (var auto \ 0))
(for (i \ 0) (< i 500000000) (incv i)
(setv sum (+ sum i)))
sum)))
The above two files are compiled as:
dalec -c b.dt -O4
dalec a.dt -O4 -o a
libb.bc
.Thanks for the extra comments. 2 is now fixed, but I don't think there's any way to fix 1, since the slowness appears to be due to compilation of the macro.
I thought the slowness of 1 was mainly caused by that, JIT compilation is another code path, and inline is not properly handled in that code path. But I am not sure.
I am closing this issue for now, since it is not important anyway.
I forgot to say.. Thank you!
Compile the following program with
dalec a.dt -s ir -O4 --no-dale-stdlib
(Dale is compiled with LLVM 6.0.1),I get:
Note those
_Z1$2bii
calls, the primitive operator functions are not inlined, despite being marked asalwaysinline
. This leads to very bad performance, and on my machine it's 10x slower than an equivalent C program compiled with gcc.After some investigation, I think this is caused by inefficient use of LLVM. As indicated by LLVM command line tools, LLVM can easily optimize the whole thing into printing a constant. With the following commands,
A file
O1.ll
is produced:I also think
--no-dale-stdlib
should always be enabled. Those Dale run-time functions are very small, and without them being properly inlined, performance of both compiled programs and compilation time (to run the macros) is largely degraded.