tshort / StaticCompiler.jl

Compiles Julia code to a standalone library (experimental)
Other
489 stars 31 forks source link

Getting the GC to work #81

Open gbaraldi opened 2 years ago

gbaraldi commented 2 years ago

So currently we aren't able to do GC allocations outside of compile which requires a running julia session. Ideally we should be able to link a binary to libjulia or some version of it, instead of having the whole runtime i.e PackageCompiler. As a hack me and @brenhinkeller managed to link to it and call jl_init, and when calling some functions from the runtime we fail with some stacktraces deep in libjulia which is promising.

Together with https://github.com/JuliaGPU/GPUCompiler.jl/pull/348, the code actually gets into the GC functions and somewhere inside it finally segfaults.

Some of these segfaults are because some global variables like types get initialized as null, and when the sysimg is being processed their actual values are substituted in. In GPUCompiler that doesn't happen so we gt IR with null where it shouldn't be a null.

 %2 = load atomic {}* ({}*, i64)*, {}* ({}*, i64)** bitcast (void ()** @jlplt_ijl_alloc_array_1d_11819_got to {}* ({}*, i64)**) unordered, align 8
  %3 = call nonnull {}* %2({}* null, i64 5)

That leads to a segfault, we need to find a way to turn that null into a pointer to the correct jl_value_t, which should be Array(Int64,1}

The IR before removing the globals looks like:

@"+Core.Array5" = internal global {}* null, !julia.constgv !0

...

 %2 = load {}*, {}** @"+Core.Array5", align 8, !dbg !52, !tbaa !41, !nonnull !0, !dereferenceable !45, !align !47
  %3 = load atomic {}* ({}*, i64)*, {}* ({}*, i64)** bitcast (void ()** @jlplt_ijl_alloc_array_1d_1764_got to {}* ({}*, i64)**) unordered, align 8, !dbg !52
  %4 = call nonnull {}* %3({}* %2, i64 5), !dbg !52
brenhinkeller commented 2 years ago

Not sure how to do this, but seems like progress!

snisher commented 1 year ago

Any progress here? Compiling an executable from arbitrary Julia code that uses libjulia / gc would be a game changer for portability. I love the idea of using Julia for compiled use cases, I think it opens so many doors for the language.

PallHaraldsson commented 9 months ago

This is about getting "the GC" to work, but I realized there's LLVM GC. There's also MMTK standalone project, and also there work to get it into Julia as an alternative.

https://www.llvm.org/docs/GarbageCollection.html

To quote @jpsamaroo

Julia has a JIT, multiple/dynamic dispatch, non-moving GC

I think he's right, and we have, and need non-moving only (because of ccall), but it doesn't need to be Julia's implementation. I think it might be easy, even trivial(?) to use some others, e.g. from LLVM. I'm not pushing for it, and I understand if you do not want a 3rd option. I'm just thinking, while the Julia implementation is devorsable from the runtime, maybe not easily. And Julia doesn't simply use Libc.malloc, i.e. from libc, but rather all (regular) allocations go though Julia now.

That's something I would want to change too, so that some other better allocators can be used as drop-in replacements. I think that might be already possible with your project.

Since GC runs (potentially) when allocation is done (and never otherwise, except in specialized real-time implementations; threading may also be an issue, but you don't support anyway, yet), I think it's trivial to add GC into it, i.e. rather call something that does both GC if needed, and calls libc. Ideally it will just call malloc in libc (dynamically), i.e. not try to implement its own pool, then the alternative malloc would still work I believe with the GC.

PallHaraldsson commented 9 months ago

GC might work already for your project without any changes to it:

Conservative garbage collection often does not require any special support from either the language or the compiler: it can handle non-type-safe programming languages (such as C/C++) and does not require any special information from the compiler. The Boehm collector is an example of a state-of-the-art conservative collector.

I.e it just needs to be used, and as I explained, I think it just needs, and can, take over malloc (and makes free a noop): https://hboehm.info/gc/

Otherwise from the page:

LLVM’s intermediate representation provides garbage collection intrinsics that offer support for a broad class of collector models. For instance, the intrinsics permit:

  • semi-space collectors
  • mark-sweep collectors
  • generational collectors
  • incremental collectors
  • concurrent collectors
  • cooperative collectors
  • reference counting

We hope that the support built into the LLVM IR is sufficient to support a broad class of garbage collected languages including Scheme, ML, Java, C#, Perl, Python, Lua, Ruby, other scripting languages, and more.

Note that LLVM does not itself provide a garbage collector — this should be part of your language’s runtime library

I got too exited and carried away thinking LLVM provides an implementation. What they have seems/ed redundant, and I think it is if you do not have threads and only need Boehm. Probably in some other situations you need what it provides. E.g for reference counting, that we wouldn't want to use, it can't be transparent I think, nor is it always faster. We would want "generational" and/or incremental. Possibly concurrent (later?) and I'm not sure what they have in mind with "cooperative collectors". Except there's MemBalance, quite exiting from a year ago, already implemented in Julia, but with a bug, so it will be reverted for 1.10. It's unclear the bug will be fixed in time. The woman behind it offered to help.

vchuravy commented 9 months ago

Julia's codegen is tightly coupled to Julia precise GC implementation. I see no reason why you wouldn't just use the Julia GC...

You could implement your own final-lower pass, but YMMV