jank-lang / jank

The native Clojure dialect hosted on LLVM
https://jank-lang.org
Mozilla Public License 2.0
1.69k stars 50 forks source link

Generate LLVM IR instead of C++ #100

Open jeaye opened 1 month ago

jeaye commented 1 month ago

jank currently generates C++, which it then gives to clang to JIT compile. There are some serious benefits to this:

  1. Super easy codegen
  2. Full access to jank's runtime API, including templates, overloading, etc
  3. The generated C++ also works well for AOT compilation
  4. The generated C++ is actually very readable

However, the drawbacks have been a big pain so far. Notably:

  1. Compiling C++ is hella slow
  2. Yup

So, clojure.core in jank is around 50% implemented. It's about 4k lines of jank and it compiles to about 80k lines of C++. This takes 12 seconds to JIT compile, which means 12 seconds absolutely minimum just to get any jank program running, assuming it has no code beyond clojure.core. So, not scalable.

I've looked into AOT compiling C++ modules and loading those up. The same 80k lines of C++ take 2 minutes to compile as a C++20 pre-compiled module. Once compiled, it loads in 0.3 seconds, which isn't bad. But even if clojure.core is AOT compiled, when you start a REPL for your program, chances are you're going to then JIT compile all of your own sources. Then we're back to waiting minutes for everything to start.

The slowdown here is simply C++ compilation, as we can see with 12 seconds going down to 0.3 seconds. So the solution is to stop generating C++ and generate LLVM IR instead. But that comes with its own costs.

emidln commented 1 month ago

New to Jank, but optimizing big C++ builds was my day job for a long time. If Jank generates deterministic c++ files, a DAG based build system ala Bazel (there are many others) that can effectively checksum the input files to know if there is a change can drastically speed up incremental recompilation in a safe manner.

jeaye commented 1 month ago

New to Jank, but optimizing big C++ builds was my day job for a long time. If Jank generates deterministic c++ files, a DAG based build system ala Bazel (there are many others) that can effectively checksum the input files to know if there is a change can drastically speed up incremental recompilation in a safe manner.

Hey! You're right that each module doesn't need to be compiled again, if it hasn't changed. Either timestamp checking or checksum checking can help there. However, we still run into the same problem. When you freshly clone someone's jank project from Github and do a lein jank run, if jank needs to pre-compile all of those modules, you'll be stuck waiting maybe 10 minutes before you can get into it.

As you and I know, in the C++ world, a 10 minute build is not high. But jank isn't just C++, it's also Clojure. A 10 minute build in Clojure is unheard of. Furthermore, does jank really need to talk that long? If we generate LLVM IR, this should take a fraction of the time to compile and it'll perform the same way (with some caveats).

I'm still working on the LLVM IR codegen, but we'll see what the numbers look like when I'm done. My expectations are that they won't be close at all.