oracle / graal

GraalVM compiles Java applications into native executables that start instantly, scale fast, and use fewer compute resources 🚀
https://www.graalvm.org
Other
20.28k stars 1.63k forks source link

IGV Fails Loading Graphs/Corrupted Graph Data #4013

Open smarr opened 2 years ago

smarr commented 2 years ago

IGV fails loading graphs generated by Graal, again.

Loading fails with the following error (two examples):

Error loading null / TruffleIR::Benchmark>>#innerBenchmarkLoop:() / Node 4,909 / Property nodeSourcePosition:
    Corrupted graph data: null, loading terminated. Previous object: org.graalvm.visualizer.data.serialization.BinaryReader$$Lambda$409/0x0000000801505c88@2a0f5e44

Error loading null / TruffleIR::Benchmark>>#innerBenchmarkLoop:() / Node 7,985 / Property nodeSourcePosition:
    Corrupted graph data: null, loading terminated. Previous object: org.graalvm.visualizer.data.serialization.BinaryReader$$Lambda$409/0x0000000801505c88@58d66fa7

This is the same symptom as back in #3385. /cc @MartinBalin @Ondrej-Douda @tkrodriguez Haven't further investigated yet. Though, a broken graph can be found here: https://stefan-marr.de/downloads/truffle/Queens-inner.bgv.bz2

smarr commented 2 years ago

Hmm, so, this seems to only be a problem when using libgraal. The graph is fine when not using libgraal.

tkrodriguez commented 2 years ago

Do you have a command line for reproducing this problem? There's some extra sanity checking to detect the problem in #3385. https://github.com/oracle/graal/blob/master/compiler/src/org.graalvm.graphio/src/org/graalvm/graphio/GraphProtocol.java#L941 is the check though it's assert only. You might enable that all the time and rebuild libgraal. I've felt a bit like there's a lurking IGV libgraal dumping problem but haven't been able to reproduce anything I could investigate.

smarr commented 2 years ago

This is on my TruffleSOM.

Something like the following should be able to reproduce:

git clone https://github.com/SOM-st/TruffleSOM.git
cd TruffleSOM
ant compile
./som -vv -i -cp Smalltalk:Examples/Benchmarks/LanguageFeatures:Examples/Benchmarks/TestSuite Examples/Benchmarks/BenchmarkHarness.som Queens 55 0 1000

-vv prints out the plain command line. A -LG would disable the use of libgraal`.

With the following, patch I get the message below. So, indeed, the assertion is failing.

diff --git a/compiler/src/org.graalvm.graphio/src/org/graalvm/graphio/GraphProtocol.java b/compiler/src/org.graalvm.graphio/src/org/graalvm/graphio/GraphProtocol.java
index c77ff3a64e0..b311b2f2470 100644
--- a/compiler/src/org.graalvm.graphio/src/org/graalvm/graphio/GraphProtocol.java
+++ b/compiler/src/org.graalvm.graphio/src/org/graalvm/graphio/GraphProtocol.java
@@ -938,7 +938,7 @@ abstract class GraphProtocol<Graph, Node, NodeClass, Edges, Block, ResolvedJavaM
             if (value instanceof String) {
                 Character id = (Character) map.get(value);
                 if (id != null && keys[id].equals(value)) {
-                    assert checkToString(key, value);
+                    if (!checkToString(key, value)) { throw new RuntimeException("Failed checkToString(key, value) for " + key.toString() + ":" + value.toString()); }
                     return id;
                 }
                 value = null;

Message:

GraphProtocol: toString mismatch for class org.graalvm.compiler.truffle.compiler.hotspot.libgraal.HSTruffleCallNode: HSTruffleCallNode[0x7f6bfc000d18] != HSTruffleCallNode[0x7f6bfc02d738]

I'd hope that the old examples would reproduce it too, though, didn't have a chance to try.

tkrodriguez commented 2 years ago

Ok, it appears this code just has to be robust in the face of inconsistent equality and toString implementations. In this case HSTruffleNode.equals checks that two objects point at the same Java object through a native handle but toString prints out the address of the handle which won't be the same in this case. I think there might be multiple problems. Anyway, I'll create an issue.

tkrodriguez commented 2 years ago

The graph you attached to the report is failing because the file itself appears to be truncated. This is mostly likely caused by the JVM shutting down in the middle the dumping. I don't think there's really anything we can do about that. The compiler threads are daemon so they can't stop the JVM from shutting down when they are in the middle of dumping. However several of the other graphs produced from your command line definitely had the bad toString problem and I have some local changes that fix this problem.

smarr commented 2 years ago

Oh, I see. Hm, well, perhaps it was user error all along :flushed:

tkrodriguez commented 2 years ago

Well it's not user error since there's nothing the user can do about it. We should do a better of job of reporting a truncated file vs a corrupted one. Several of the graphs produced from running queens are in fact corrupted so there's definitely a real bug here. For future reference https://github.com/Shopify/seafoam can be a useful way of seeing what's wrong with a BGV file using the debug command line option. If the parsing fails you get a ruby backtrace showing where it went wrong. The truncated file problem always shows up as undefined method 'unpack1' for nil:NilClass before byte 0 because it attempts to read past the end of the file and gets back nil.