Julia timing number includes compile time by accident?

Moelf commented 1 year ago

The FAQ clearly says that:

Are the compile times included in the measurements?

No they are not included, because when running the program in the real world this would also be done before.

the only way I can get Julia to spend more than a few milliseconds is if I include the launch of Julia time and the compilation time paid when you first run f(rounds):

> time julia --startup-file=no ./leibniz.jl
3.1415916535917745
________________________________________________________
Executed in  126.92 millis    fish           external
   usr time  243.65 millis  364.00 micros  243.28 millis
   sys time  814.91 millis   81.00 micros  814.83 millis

we can see this compilation time by wrapping the file reading and workload in a main function and run it twice

diff --git a/src/leibniz.jl b/src/leibniz.jl
index 5967ebc..5821e2b 100644
--- a/src/leibniz.jl
+++ b/src/leibniz.jl
@@ -10,5 +10,11 @@ function f(rounds)
     return pi*4
 end

-rounds = parse(Int64, readchomp(joinpath(@__DIR__, "rounds.txt")))
-print(f(rounds))
+function main()
+    rounds = parse(Int64, readchomp(joinpath(@__DIR__, "rounds.txt")))
+    print(f(rounds))
+end
+
+@time @eval main()
+@time @eval main()

then:

> julia --startup-file=no ./leibniz.jl
3.1415916535917745  0.028883 seconds (53.85 k allocations: 3.236 MiB, 79.21% compilation time)
3.1415916535917745  0.001276 seconds (64 allocations: 3.469 KiB)

jkrumbiegel commented 1 year ago

To be fair, Julia incurs the compilation overhead on each session, depending on your workload this matters for "real world use". I think what would be the best solution is to give Julia a "special" bar in the plot, which just splits the runtime up into compilation and runtime, just because Julia's paradigm is so different than that of all the other languages. At least this way, it would neither be hidden that Julia is very fast, nor that it does suffer from compilation latency (but again only once per session in usual workflows).

niklas-heer commented 1 year ago

@jkrumbiegel Thank you very much for providing this information.

I see two problems with that, one is a practical one and the other is a personal view-point.

Practical challenge It would mean a huge change in the scbench tool to make it possible to capture benchmarks from within programs. Currently, the tool just runs a command and measures how long it takes. If I understand your point correctly, to measure the time like you suggest, I would have to add code to the leibniz.jl program (@time @eval <function()>) and then get the output from my scbench tool and interpret the result. That's just not feasible for one language.

Personal view-point I get that Julia can be very fast, especially if you "have it already running". But I'm measuring a complete script run, so for me that also includes the compilation overhead if that is how a script is invoked.
If Julia can output a file like C, C++, Rust or Go then I would create an extra compile step and invoke the complied program.

Thus, I will close this issue.

Moelf commented 1 year ago

If Julia can output a file like C, C++, Rust or Go then I would create an extra compile step and invoke the complied program.

it can, by using https://github.com/tshort/StaticCompiler.jl

are you willing to take PR that adds this as a compile step?

@niklas-heer

niklas-heer commented 1 year ago

@Moelf sounds interesting.

I would be willing to accept a PR which adds an alternative execution. Similar to CPython and pypy.

Something like this:

julia-static-compiler:
  FROM julia:1.6.7-alpine3.16
  COPY ./src/rounds.txt ./
  COPY +build/scbench ./

  COPY ./src/leibniz.jl ./
  # probably here needs to be a special step
  RUN --no-cache ./scbench "julia leibniz.jl" -i $iterations -l "julia --version" --export json --lang "Julia (Static Compiler)"
  SAVE ARTIFACT ./scbench-summary.json AS LOCAL ./results/julia-static-compiler.json

Moelf commented 1 year ago

it's not alternative execution, you just have to decide WHAT are you actually measuring, if you don't want to measure compile time, then don't run Julia like how it's being run right now

CPython and pypy (and Ironpython and Jython and MicroPythons) are different implementations, there's only one implementation of Julia, it's just a matter of how much it's compiled

niklas-heer commented 1 year ago

@Moelf I would like to measure "the standard way". If you save a Julia script to run something, that is what I want to test.

Moelf commented 1 year ago

pypy is not the standard way then; also nobody in Julia would say it's standard to "call a cheap script over and over just to pay the start-up time every time"

jariji commented 1 year ago

I think many Julia programs (such as Documenter.jl) are compiled and run as a script, and do pay the compilation cost at runtime, so having both methods of execution shown separately on the benchmark makes sense to me.

ChrisRackauckas commented 1 year ago

Probably the majority of Julia code that is ran consists of packaged code, and packages precompile during the installation which eliminates the Julia inference time and only leaves the LLVM time (and there's a big effort to also cache the LLVM compiled code in the next year or so). For this reason, direct comparisons of a script can vastly over-estimate the compile times involved with an actual run: you can put that in a package and many times get a few orders of magnitude less because of how the caching works (when snoop-precompiled).

I think that's the bigger issue: measuring the compile time of something ran once in isolation is simply not indicative of the timings someone will see when the code is deployed (where it will then see itself used by thousands of other people, used orders of magnitude more times).

niklas-heer commented 1 year ago

I agree with @jariji that showing it separately would make sense.

@ChrisRackauckas @jariji How would you normally precompile Julia code? I mean the commands you would use to achieve that.

jariji commented 1 year ago

PackageCompiler.jl like https://github.com/niklas-heer/speed-comparison/pull/31 is a popular way. The next generation StaticCompiler.jl is still under development.

niklas-heer commented 1 year ago

Ah okay, and does it generate a compiled program output somewhere?

jariji commented 1 year ago

It produces a .so file that will reduce the startup time a bit. With the current number of rounds, Julia's startup time is >95 percent of the total.

mcabbott commented 1 year ago

Running this for (say) 1000 and 10^9 rounds would separate number-crunching speed from startup / io time. Quick tests put Julia with C on the long run (unsurprisingly), but seeing how PyPy, R, javascript scale might also be interesting?

niklas-heer commented 1 year ago

Would need a massive rework to do two separate rounds for different rounds settings, so that isn't feasible, but I'm open to changing: https://github.com/niklas-heer/speed-comparison/blob/master/src/rounds.txt

niklas-heer commented 1 year ago

Closing this due to #31 now being merged. For the rounds.txt change, you can either create another issue or directly a pull request.

niklas-heer commented 1 year ago

Here are the results of #31: https://niklas-heer.github.io/speed-comparison/pages/2022-10-15T223909.html

Although I must say, it added a quite a bit of CI time. Before it was taking ~5m now it takes ~7-8m. So, 2 minutes CI time for ~80ms improvement 😅

Moelf commented 1 year ago

@niklas-heer I would be fine to remove the PakcgeCompiler.jl time and simply bump nrounds to be x1000 larger.

it would not change time for Julia, because now 99% is startup time

niklas-heer commented 1 year ago

@Moelf fair enough. Let's see the CI results of #35 first.

niklas-heer / speed-comparison

Julia timing number includes compile time by accident? #22