GraalVM Native Image support

remexre commented 3 years ago

Looks like as of GraalVM 21.0.0, native-image can build Silver and Silver programs with JVM fallback; I don't think this is actually useful yet, but it's progress so I figure I'll open a tracking issue. (Note that this issue is wholly unrelated to SilvIR-on-Truffle or anything, and is just "can we do initialization and JIT warmup at build time.")

native-image'd Silver (with JVM fallback) currently isn't usable; it exits with 0 when it should call out to Ant.

On a small program that does not use autocopy attributes (after patching silver:langutil:pp to remove its use of them), native-image with fallback cuts about 7% of execution time off (NB: didn't do proper stats, just eyeballed 5 runs after a 5-run warmup of each).

On the same program, --no-fallback fails due to Copper's use of java.io.ObjectInputStream; it appears a config file is needed to statically list the classes this will be used with. Providing https://p.remexre.xyz/Xj6WvLaiMAA= (as generated by the GraalVM native-image agent) via a bit of zipfile surgery, I get an amazing speedup from ~1.56sec to ~0.02sec!

Unfortunately, said config files don't work for Silver, which still dies early on when doing stuff with autocopy:

Exception in thread "main" common.exceptions.SilverInternalError: Error while applying autocopy decorators.
        at common.Decorator.decorateAutoCopy(Decorator.java:41)
        at silver.rewrite.DgivenStrategy.decorate(DgivenStrategy.java:10)
        at common.Decorator.applyDecorators(Decorator.java:22)
        at silver.core.Init.postInit(Init.java:63)
        at silver.langutil.reflect.Init.postInit(Init.java:36)
        at silver.compiler.modification.let_fix.java.Init.postInit(Init.java:54)
        at silver.compiler.composed.Default.Init.postInit(Init.java:158)
        at silver.compiler.composed.Default.Main.main(Main.java:11)
Caused by: java.lang.NoSuchFieldException: occurs_inh
        at java.lang.Class.getField(DynamicHub.java:1078)
        at common.Decorator.decorateAutoCopy(Decorator.java:36)
        ... 7 more

I'll try replacing every autocopy attribute in Silver with inherited+propagate after today's meeting, I guess?

krame505 commented 3 years ago

Cool!

Would finishing the Copper API work help at all with the ant and copper issues?

Not sure that I understand the issue with autocopy, though. Is this a problem with using reflection in general? If so then the use of Silver's reflection library for interface file handling would also pose a problem. I'm not sure that we want to jump straight to deprecating autocopy everywhere just yet.

remexre commented 3 years ago

Oh, wow; just tried this on ableC; (after using the native-image agent) it "just works" with no code changes! (this is an unextended compiler)

$ time java -Xss6M -jar ableC.jar testing/tests/melt/positive/1.c
java -Xss6M -jar ableC.jar testing/tests/melt/positive/1.c  6.59s user 0.50s system 187% cpu 3.770 total

$ time ./ableC testing/tests/melt/positive/1.c
./ableC testing/tests/melt/positive/1.c  0.17s user 0.10s system 99% cpu 0.271 total

ericvanwyk commented 3 years ago

Wow is right... This is very impressive.

On Mon, May 3, 2021 at 1:44 PM Nathan Ringo @.***> wrote:

Oh, wow; just tried this on ableC; (after using the native-image agent) it "just works" with no code changes! (this is an unextended compiler)

$ time java -Xss6M -jar ableC.jar testing/tests/melt/positive/1.c java -Xss6M -jar ableC.jar testing/tests/melt/positive/1.c 6.59s user 0.50s system 187% cpu 3.770 total

$ time ./ableC testing/tests/melt/positive/1.c ./ableC testing/tests/melt/positive/1.c 0.17s user 0.10s system 99% cpu 0.271 total

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/melt-umn/silver/issues/512#issuecomment-831456042, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2C53QWHF6RTSRETTUAHLTL3VHPANCNFSM43M3PFUQ .

krame505 commented 3 years ago

And this is without removing the autocopy attributes in ableC?

I would be interested to see the performance numbers for an extension that uses object-language syntax (requiring reflection) - maybe ableC-rewriting or ableC-prolog. How hard is it to try this out this right now?

remexre commented 3 years ago

And this is without removing the autocopy attributes in ableC?

Yep; looks like the agent is good enough to resolve all that reflection.

I would be interested to see the performance numbers for an extension that uses object-language syntax (requiring reflection) - maybe ableC-rewriting or ableC-prolog. How hard is it to try this out this right now?

Right now I have this all strung together with my nascent + not-yet-separated-from-my-projects Nix setup for Silver, but if you don't have Nix, the steps are roughly:

install GraalVM CE; I would do it from the website if your system package manager has an old version, progress on this has improved a lot since 20.2.0
build ableC.jar as normal (even DLing CI-built jars should work, Graal adds no magic to the javac part, afaik)
do these things: https://git.sr.ht/~remexre/whgvah/tree/f22e158d46e1fad9e18dd78ff8a34b98b8d8ee18/item/default.nix#L27

I suspect the input test file needs to use every concrete type that ends up being present at runtime (no clue how it'd work otherwise); if you wanna run the whole test suite, change config-output-dir to config-merge-dir
you need to use zip, not jar, because jar will blow away other stuff in META-INF
-H:Name specifies the output filename
expect native-image to use gigs of memory and take minutes; 8G and 5min for the unextended ableC

If this sounds like a pain, I can do it; lmk what JAR I should use

Because of the last bullet, I think the perf gains here probably change the goal of silvir, not invalidate it -- it's still worth doing, because we should be able to get comparable perf without 8G+5min of figuring out info the compiler had anyway.

I think it does mean that someone should light a fire under my chair wrt the de-antifying (#400); that's the only blocker I know of to doing this to Silver, which I expect would improve the developing-things-other-than-Silver-in-Silver experience a lot.

remexre commented 3 years ago

ableC-prolog didn't work with a few test cases; a proper solution there might be a late-in-compile phase to create a reflect-config.json file (example). I think this would have to be a whole-program analysis, but at least would be a cheap one, and would only be done for native-image/SubstrateVM builds anyway.

remexre commented 3 years ago

Okay, looks like problem last time I tried this on silver was that I screwed up the scripts; I fixed them now. Below results are on silo, which has approximately the same hardware as foundry.

Building the native binary takes ~7m50s.

The instrumented Silver build takes ~3m03s. This could be removed once Silver generates this (or a script running on the Java output generates it).

A native self-compile takes ~3m38s.

Timestamped build log

A JVM self-compile takes ~2m26s.

Timestamped build log

I need to go digging for why the performance is unexpectedly worse... perf didn't work on the binary, even with the -H:+PreserveFramePointer -H:-DeleteLocalSymbols options.

If someone else wants to try too, the patch I'm using is here.

remexre commented 3 years ago

The plot thickens; I rebuilt it with --initialize-at-build-time and got

00:00:00 Congrats, you're using Silver Native!
00:00:00 Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 34 out of bounds for length 34
00:00:00    at silver.compiler.analysis.warnings.Init.setupInheritedAttributes(Init.java:47)
00:00:00    at silver.compiler.analysis.warnings.Init.init(Init.java:23)
00:00:00    at silver.compiler.analysis.warnings.flow.Init.init(Init.java:57)
00:00:00    at silver.compiler.definition.flow.syntax.Init.init(Init.java:39)
00:00:00    at silver.compiler.definition.type.syntax.Init.init(Init.java:34)
00:00:00    at silver.compiler.modification.autocopyattr.Init.init(Init.java:34)
00:00:00    at silver.compiler.definition.flow.driver.Init.init(Init.java:45)
00:00:00    at silver.compiler.driver.util.Init.init(Init.java:41)
00:00:00    at silver.compiler.definition.env.Init.init(Init.java:31)
00:00:00    at silver.compiler.analysis.typechecking.core.Init.init(Init.java:33)
00:00:00    at silver.compiler.definition.flow.env.Init.init(Init.java:38)
00:00:00    at silver.compiler.modification.let_fix.java.Init.init(Init.java:34)
00:00:00    at silver.compiler.composed.Default.Init.init(Init.java:85)
00:00:00    at silver.compiler.composed.Default.Main.main(Main.java:10)

So maybe we're triggering some bugs somewhere...

remexre commented 3 years ago

Increased thickening continues, Chris Seaton pointed out on the GraalVM Slack that SubstrateVM has a worse GC than the standard JVM; passing -H:InitialCollectionPolicy='com.oracle.svm.core.genscavenge.CollectionPolicy$NeverCollect' to disable garbage collection sped up the native build to be on par with the standard JVM (timestamped build log).

remexre commented 3 years ago

Oh, wow, it's definitely the reflection; term_rewriting.jar from https://github.com/melt-umn/lambda-calculus takes ~8s to run on e4.lambda (in the same); it takes ~20s to run after compilation. Prodletons, here we come?

remexre commented 3 years ago

Okay, well, maybe something spookier is going on; the above 20s figure was with garbage collection off, since SubstrateVM's GC is supposed to be slower than the standard JVM's; using the default GC settings lowered runtime to ~12s; still slower, but not so absurdly so.

remexre commented 3 years ago

Spooks confirmed... I again see much worse performance with GC off; weirdly, time is spent in sys? From an strace of each, the mmaps are just more expensive? Need more analysis. gtg rn, results and scripts here

remexre commented 3 years ago

Latest batch of logs, from same scripts on top of #539. Highlights:

peak memory use with:

jvm-gc-noop: 16GiB jvm-gc-clean: 16GiB native-gc-noop: 860 MiB native-gc-clean: 7.6 GiB native-epsilon-noop: 9.6 GiB native-epsilon-clean: 51 GiB

total time:

jvm-gc-noop: 21s jvm-gc-clean: 4m10s native-gc-noop: 38s native-gc-clean: 4m31s native-epsilon-noop: 24s native-epsilon-clean: 2m34s

Will try to investigate some more tomorrow, but I suspect this means we're GC-limited? These results are "less clean" than the previous ones; they're on foundry while Jenkins was running, so they probably got some interference; would've tested on silo, but it's got unrelated Silver changes on its local checkout... will try reproing them tomorrow, since e.g. jvm-gc-clean took like 15% longer than it did last time.

remexre commented 3 years ago

Oh, just realized the gains are being "muffled" by the javac run time; the times up until printing Buildfile: /home/nathan/melt/silver/build.xml are:

jvm-gc-noop: 12s
jvm-gc-clean: 2m14s
native-gc-noop: 29s
native-gc-clean: 2m57s
native-epsilon-noop: 18s
native-epsilon-clean: 1m27s

ugh, if the epsilon memory use weren't so terrible... maybe if silvir does its own native codegen, we could try out something clever with reference counting on the page level, and having a second thread concurrently collect dead pages; that ought to provide epsilon-like performance while lowering peak memory use dramatically? dunno, would need experiments.

At this point, next task is probably running some profiler that supports both the Hotspot VM and SubstrateVM, and comparing the traces to see what's more expensive on native-epsilon-noop vs jvm-gc-noop; most Silver builds are closer to noop (i.e. all svi files) than clean, so a regression here is pretty tragic...

melt-umn / silver

GraalVM Native Image support #512