Closed Moelf closed 1 year ago
To be fair, Julia incurs the compilation overhead on each session, depending on your workload this matters for "real world use". I think what would be the best solution is to give Julia a "special" bar in the plot, which just splits the runtime up into compilation and runtime, just because Julia's paradigm is so different than that of all the other languages. At least this way, it would neither be hidden that Julia is very fast, nor that it does suffer from compilation latency (but again only once per session in usual workflows).
@jkrumbiegel Thank you very much for providing this information.
I see two problems with that, one is a practical one and the other is a personal view-point.
Practical challenge
It would mean a huge change in the scbench
tool to make it possible to capture benchmarks from within programs. Currently, the tool just runs a command and measures how long it takes. If I understand your point correctly, to measure the time like you suggest, I would have to add code to the leibniz.jl
program (@time @eval <function()>
) and then get the output from my scbench
tool and interpret the result.
That's just not feasible for one language.
Personal view-point
I get that Julia can be very fast, especially if you "have it already running". But I'm measuring a complete script run, so for me that also includes the compilation overhead if that is how a script is invoked.
If Julia can output a file like C, C++, Rust or Go then I would create an extra compile step and invoke the complied program.
Thus, I will close this issue.
If Julia can output a file like C, C++, Rust or Go then I would create an extra compile step and invoke the complied program.
it can, by using https://github.com/tshort/StaticCompiler.jl
are you willing to take PR that adds this as a compile step?
@niklas-heer
@Moelf sounds interesting.
I would be willing to accept a PR which adds an alternative execution. Similar to CPython and pypy.
Something like this:
julia-static-compiler:
FROM julia:1.6.7-alpine3.16
COPY ./src/rounds.txt ./
COPY +build/scbench ./
COPY ./src/leibniz.jl ./
# probably here needs to be a special step
RUN --no-cache ./scbench "julia leibniz.jl" -i $iterations -l "julia --version" --export json --lang "Julia (Static Compiler)"
SAVE ARTIFACT ./scbench-summary.json AS LOCAL ./results/julia-static-compiler.json
it's not alternative execution, you just have to decide WHAT are you actually measuring, if you don't want to measure compile time, then don't run Julia like how it's being run right now
CPython and pypy (and Ironpython and Jython and MicroPythons) are different implementations, there's only one implementation of Julia, it's just a matter of how much it's compiled
@Moelf I would like to measure "the standard way". If you save a Julia script to run something, that is what I want to test.
pypy
is not the standard way then; also nobody in Julia would say it's standard to "call a cheap script over and over just to pay the start-up time every time"
I think many Julia programs (such as Documenter.jl) are compiled and run as a script, and do pay the compilation cost at runtime, so having both methods of execution shown separately on the benchmark makes sense to me.
Probably the majority of Julia code that is ran consists of packaged code, and packages precompile during the installation which eliminates the Julia inference time and only leaves the LLVM time (and there's a big effort to also cache the LLVM compiled code in the next year or so). For this reason, direct comparisons of a script can vastly over-estimate the compile times involved with an actual run: you can put that in a package and many times get a few orders of magnitude less because of how the caching works (when snoop-precompiled).
I think that's the bigger issue: measuring the compile time of something ran once in isolation is simply not indicative of the timings someone will see when the code is deployed (where it will then see itself used by thousands of other people, used orders of magnitude more times).
I agree with @jariji that showing it separately would make sense.
@ChrisRackauckas @jariji How would you normally precompile Julia code? I mean the commands you would use to achieve that.
PackageCompiler.jl like https://github.com/niklas-heer/speed-comparison/pull/31 is a popular way. The next generation StaticCompiler.jl is still under development.
Ah okay, and does it generate a compiled program output somewhere?
It produces a .so
file that will reduce the startup time a bit. With the current number of rounds
, Julia's startup time is >95 percent of the total.
Running this for (say) 1000 and 10^9 rounds would separate number-crunching speed from startup / io time. Quick tests put Julia with C on the long run (unsurprisingly), but seeing how PyPy, R, javascript scale might also be interesting?
Would need a massive rework to do two separate rounds for different rounds settings, so that isn't feasible, but I'm open to changing: https://github.com/niklas-heer/speed-comparison/blob/master/src/rounds.txt
Closing this due to #31 now being merged.
For the rounds.txt
change, you can either create another issue or directly a pull request.
Here are the results of #31: https://niklas-heer.github.io/speed-comparison/pages/2022-10-15T223909.html
Although I must say, it added a quite a bit of CI time. Before it was taking ~5m now it takes ~7-8m. So, 2 minutes CI time for ~80ms improvement 😅
@niklas-heer I would be fine to remove the PakcgeCompiler.jl time and simply bump nrounds to be x1000 larger.
it would not change time for Julia, because now 99% is startup time
@Moelf fair enough. Let's see the CI results of #35 first.
The FAQ clearly says that:
the only way I can get Julia to spend more than a few milliseconds is if I include the launch of Julia time and the compilation time paid when you first run
f(rounds)
:we can see this compilation time by wrapping the file reading and workload in a main function and run it twice
then: