titzer / virgil

A fast and lightweight native programming language
1.22k stars 42 forks source link

Why is Virgil so fast? #80

Closed srackham closed 1 year ago

srackham commented 2 years ago

I ran a Hello World + Fibonacci benchmark comparing Virgil with Rust and TinyGo (the two most often cited Wasm compilers) — the results seem to good to be true!

Virgil outperforms both Rust and TinyGo by orders of magnitude in terms of both compiler speed and executable file sizes. Yes, the 0.00s compile time is correct — time(1) reports to the nearest 1/100s (when I compiled my first Virgil program it was so fast I thought it hadn't run).

The Numbers

wasm

Compile time (secs) Executable size (B) Execution time (secs)
go 4.13s 428,547 0.62s
rust 0.33s 2,054,632 1.80s
virgil 0.00s 8,802 0.96s

wasm-optimised

Compile time (secs) Executable size (B) Execution time (secs)
go 3.88s 191,265 0.62s
rust 0.80s 301,363 0.66s
virgil 0.01s 7,891 1.07s

x86-64-linux

Compile time (secs) Executable size (B) Execution time (secs)
go 1.94s 503,640 0.39s
rust 0.30s 3,853,504 1.53s
virgil 0.01s 20,552 0.63s

x86-64-linux-optimised

Compile time (secs) Executable size (B) Execution time (secs)
go 2.04s 140,056 0.38s
rust 2.73s 1,653,736 0.32s
virgil 0.01s 19,552 0.64s

WebAssembly Performance

x86-64 Performance

Notes

  1. Virgil Wasm code generated with the compiler-opt=all option ran slower than without it but the executable size was ~10% smaller, so currently there's not a lot to be gained using the -opt=all option.

  2. Importing the fmt package increased the size of the TinyGo Wasm executable from 8KB to 191KB (an increase of 183KB), whereas importing the Virgil Strings component increased the size of the Virgil Wasm executable from 3.6KB to 7.9KB (an increase of only 4.3KB).

  3. The compiled Wasm files were executed with wasmtime-cli 0.39.1

Details

The raw data along with source code and platform information is attached. go-results.txt rust-results.txt virgil-results.txt

titzer commented 2 years ago

Nice!

Virgil outperforms both Rust and TinyGo by orders of magnitude in terms of both compiler speed and executable file sizes. Yes, the 0.00s compile time is correct — time(1) reports to the nearest 1/100s (when I compiled my first Virgil program it was so fast I thought it hadn't run).

Indeed. This goes right along with what we were discussing in the #79 ; v3c (Aeneas) compiles only the source code handed to it; there aren't mountains of source code it hunts through or enormous runtime code binaries that need to be linked in. In fact, the runtime is part of the command-line execution of v3c--the entire compilation step from source to binary is contained in one invocation of the compiler.

Also, the Virgil compiler parses, typechecks, and runs initializers for all code, but it only compiles reachable code from main(). It doesn't go past ASTs for anything not reachable from main or needed to run initializers. The reachability phase feeds into polymorphic specialization, so generic code doesn't get specialized in ways that it isn't used. Compilation is generally fast because most optimizations don't even iterate on the SSA. It's fairly lightweight local optimizations on SSA for now.

I'd be interested to see what results you get for x86-linux. Despite that backend being a bit older, it has a register allocator that does better in most situations (but much, much worse in others), so I'd expect the 32-bit code in this example to run even faster.

srackham commented 2 years ago

it only compiles reachable code from main(). It doesn't go past ASTs for anything not reachable from main or needed to run initializers.

Very clever.

I'd be interested to see what results you get for x86-linux.

Here you go (I've also included the Virgil JVM results):

The Hello World + Fibonacci app doesn't really exercise the language and is probably not representative of "real world" code, if you have a better benchmark I could run it.

wasm

Compile time (secs) Executable size (B) Execution time (secs)
go 4.16s 428,547 0.61s
rust 0.33s 2,054,632 1.91s
virgil 0.01s 8,802 0.98s

wasm-optimised

Compile time (secs) Executable size (B) Execution time (secs)
go 3.80s 191,265 0.62s
rust 0.77s 301,363 0.67s
virgil 0.01s 7,891 1.14s

x86-64-linux

Compile time (secs) Executable size (B) Execution time (secs)
go 1.91s 503,640 0.39s
rust 0.31s 3,853,504 1.55s
virgil 0.01s 20,552 0.63s

x86-64-linux-optimised

Compile time (secs) Executable size (B) Execution time (secs)
go 1.94s 140,056 0.41s
rust 2.73s 1,653,736 0.33s
virgil 0.01s 19,552 0.64s

x86-linux

Compile time (secs) Executable size (B) Execution time (secs)
go
rust
virgil 0.01s 18,568 0.58s

x86-linux-optimised

Compile time (secs) Executable size (B) Execution time (secs)
go
rust
virgil 0.01s 17,884 0.62s

jvm

Compile time (secs) Executable size (B) Execution time (secs)
go
rust
virgil 0.00s 17,715 0.75s

jvm-optimised

Compile time (secs) Executable size (B) Execution time (secs)
go
rust
virgil 0.00s 15,543 0.60s

The raw data along with source code, compiler commands and platform information is attached:

virgil-results.txt rust-results.txt go-results.txt

diakopter commented 2 years ago

(I'm curious to see the maximum memory allocated during the executions too (even for the compilations))

On Tue, Aug 23, 2022 at 11:39 PM Stuart Rackham @.***> wrote:

it only compiles reachable code from main(). It doesn't go past ASTs for anything not reachable from main or needed to run initializers.

Very clever.

I'd be interested to see what results you get for x86-linux.

Here you go (I've also included the Virgil JVM results):

  • x86 and x86-64 executables have roughly the same execution times but the x86 executable is ~10% smaller.
  • JVM execution times are also roughly the same as the x86 and x86-64 and, in terms of size, about ~10% smaller than x86 executables.

The Hello World + Fibonacci app doesn't really exercise the language and is probably not representative of "real world" code, if you have a better benchmark I could run it. wasm Compile time (secs) Executable size (B) Execution time (secs) go 4.16s 428,547 0.61s rust 0.33s 2,054,632 1.91s virgil 0.01s 8,802 0.98s wasm-optimised Compile time (secs) Executable size (B) Execution time (secs) go 3.80s 191,265 0.62s rust 0.77s 301,363 0.67s virgil 0.01s 7,891 1.14s x86-64-linux Compile time (secs) Executable size (B) Execution time (secs) go 1.91s 503,640 0.39s rust 0.31s 3,853,504 1.55s virgil 0.01s 20,552 0.63s x86-64-linux-optimised Compile time (secs) Executable size (B) Execution time (secs) go 1.94s 140,056 0.41s rust 2.73s 1,653,736 0.33s virgil 0.01s 19,552 0.64s x86-linux Compile time (secs) Executable size (B) Execution time (secs) go rust virgil 0.01s 18,568 0.58s x86-linux-optimised Compile time (secs) Executable size (B) Execution time (secs) go rust virgil 0.01s 17,884 0.62s jvm Compile time (secs) Executable size (B) Execution time (secs) go rust virgil 0.00s 17,715 0.75s jvm-optimised Compile time (secs) Executable size (B) Execution time (secs) go rust virgil 0.00s 15,543 0.60s

The raw data along with source code, compiler commands and platform information is attached:

virgil-results.txt https://github.com/titzer/virgil/files/9412833/virgil-results.txt rust-results.txt https://github.com/titzer/virgil/files/9412834/rust-results.txt go-results.txt https://github.com/titzer/virgil/files/9412835/go-results.txt

— Reply to this email directly, view it on GitHub https://github.com/titzer/virgil/issues/80#issuecomment-1225182026, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMIVVUO4UZ5A2XFAAXLWLV2WRRHANCNFSM57NK4YSA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Sent by an Internet

titzer commented 2 years ago

That reminds me, I've been meaning to make memory profiling work with the native GC but haven't gotten around to it yet.

I know that a typical bootstrap of Aeneas does not cause a single GC. As Aeneas has a 512MB heap running on x86-linux, that means it allocates less than 256MB of memory total for a self-compile.

srackham commented 2 years ago

@diakopter I added compilation and execution memory consumption columns to the results:

For native compilation targets Virgil wins in terms of minimising hardware requirements (executable memory and storage).

The bash commands that generated the results have been added to the attached raw results files.

wasm

Compile time (secs) Executable size (B) Execution time (secs) Compilation memory Execution memory
go 3.74s 428,547 0.62s 173740 KB 11888 KB
rust 0.33s 2,054,632 1.86s 117964 KB 11540 KB
virgil 0.01s 8,802 1.00s 5776 KB 11060 KB

wasm-optimised

Compile time (secs) Executable size (B) Execution time (secs) Compilation memory Execution memory
go 3.91s 191,265 0.62s 181480 KB 11596 KB
rust 0.72s 301,363 0.66s 154196 KB 10740 KB
virgil 0.01s 7,891 1.10s 5744 KB 11144 KB

x86-64-linux

Compile time (secs) Executable size (B) Execution time (secs) Compilation memory Execution memory
go 1.95s 503,640 0.38s 164452 KB 296 KB
rust 0.32s 3,853,504 1.55s 126424 KB 2028 KB
virgil 0.01s 20,552 0.62s 6456 KB 292 KB

x86-64-linux-optimised

Compile time (secs) Executable size (B) Execution time (secs) Compilation memory Execution memory
go 1.89s 140,056 0.38s 177796 KB 292 KB
rust 2.74s 1,653,736 0.32s 222596 KB 1844 KB
virgil 0.01s 19,552 0.67s 6436 KB 292 KB

x86-linux

Compile time (secs) Executable size (B) Execution time (secs) Compilation memory Execution memory
go
rust
virgil 0.01s 18,568 0.63s 6412 KB 292 KB

x86-linux-optimised

Compile time (secs) Executable size (B) Execution time (secs) Compilation memory Execution memory
go
rust
virgil 0.01s 17,884 0.64s 6428 KB 292 KB

jvm

Compile time (secs) Executable size (B) Execution time (secs) Compilation memory Execution memory
go
rust
virgil 0.00s 17,715 0.68s 3960 KB 37516 KB

jvm-optimised

Compile time (secs) Executable size (B) Execution time (secs) Compilation memory Execution memory
go
rust
virgil 0.00s 15,543 0.60s 3940 KB 37540 KB

virgil-results.txt rust-results.txt go-results.txt

srackham commented 2 years ago

I know that a typical bootstrap of Aeneas does not cause a single GC. As Aeneas has a 512MB heap running on x86-linux, that means it allocates less than 256MB of memory total for a self-compile.

Virgil sure is parsimonious.

titzer commented 2 years ago

Virgil sure is parsimonious.

I might like garbage collection but I don't like garbage :-)

diakopter commented 2 years ago

(I've been intending to contribute a more sophisticated dual-nursery generational allocator/collector for collection-heavy programs such as the ones I use... now that it seems it can support a perfectly precise collector (registers included), maybe now's the right time..)

On Wed, Aug 24, 2022, 8:46 PM Ben L. Titzer @.***> wrote:

Virgil sure is parsimonious.

I might like garbage collection but I don't like garbage :-)

— Reply to this email directly, view it on GitHub https://github.com/titzer/virgil/issues/80#issuecomment-1226642419, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMIVUV4QHEEJSMBHGMD7LV2266ZANCNFSM57NK4YSA . You are receiving this because you were mentioned.Message ID: @.***>