oracle / graal

GraalVM compiles Java applications into native executables that start instantly, scale fast, and use fewer compute resources 🚀
https://www.graalvm.org
Other
20.4k stars 1.64k forks source link

[Truffle] Source / SourceSection based on existing external AST #4360

Open enikao opened 2 years ago

enikao commented 2 years ago

Currently, we can create a Source based on characters or bytes. In my use case, I already have the AST data structure, and want to pass it to my interpreter. (My use case are interpreters inside JetBrains MPS.)

My current workaround passes a string-encoded id of my external AST root node to Source.newBuilder(). Inside TruffleLanguage.parse(), I ask some global static object to get the external AST root node by the passed id. Then I can trivially traverse the external AST and convert it to Truffle nodes. This safes all the effort of serializing + parsing the external AST.

As the external AST is created in a projectional editor, there is never any plain text involved. Thus, I cannot create a SourceLocation for any of the external nodes based on line and column.

I can imagine some implementations:

  1. Truffle provides a Source and SourceSection based on Object. I don't see a good way of more specific input types. In my use case, I cannot add an interface to the existing AST. If we required an external interface IExternalNode, I could translate the external AST to IExternalNode, pass it to Truffle, and convert it to Truffle nodes. Still better than serializing + parsing, but not optimal.

  2. Provide Source based on externally created Truffle Nodes. To my understanding, I have to create the Truffle Nodes inside the TruffleLanguage.parse() method, as I have to access language-internal information (e.g. passing the language instance to RootNode). If we could remove this dependency, we could pass the readily-created Truffle Nodes to Source. SourceSection would need at least a String-based variant to store the serialized external source node id.

  3. Enable interpreter developers to subclass Source and SourceSection. Change visibility of Source and SourceSection constructors to protected, so interpreter developers can implement them on their own.

Additional context:

I blogged about the general idea before. Steps 0, 2, 3, and 4 are working.

If this idea is accepted, I'd consider contributing to the implementation.

rodrigar-mx commented 2 years ago

Hi @enikao. Thanks for contributing to graalvm. We will take a look to your request, and any questions or comments we will let you know.

chumer commented 2 years ago

(I had a longer answer as a draft, but just lost it, so you will get a shorter answer instead, sorry)

The short answer to your question is that we cannot break encapsulation between the language/guest world and the host world. If you do that you are on your own, we can't support such an approach. I am surprised your current solution works at all. Do you disable the GraalVM locator? How can you access classes of your guest language implementation in the host?

Whenever you find yourself using a Truffle API and an org.graalvm.polyglot API in the same class you are doing something the system was not designed for. Truffle APIs are an implementation detail encapsulated behind the polyglot API.

None of your proposed solutions work, as they all introduce some form of encapsulation breakage.

The simplest solution to your problem is if you can just reparse inside of your language and then build up Truffle ASTs.

For a version that avoids reparsing, I have a sketch that translates the AST. Keeps all the Truffle related code inside of the language. This is just a sketch a real implementation would need to translate different AST types. But I think you should get the idea:

On the host side you do this:

public static class AST {
        public List<AST> getChildren() {
            // imagine impl
            return null;
        }
    }

    public static void main(String[] args) {
        AST externalAST = /* ... */ null;

        Context context = Context.newBuilder().allowHostAccess(HostAccess.ALL).build();

        // lookup a specialEval function exposed by your language in the
        // TruffleLanguage#getScope
        Value evalAST = context.getBindings("myLanguage").getMember("specialEval");

        evalAST.execute(externalAST);

    }

On the guest side in your Truffle implementation evalAST could look like this (did not really run this code):

    @ExportLibrary(InteropLibrary.class)
    static class EvalASTFunction implements TruffleObject {

        private final Map<Object, CallTarget> cache = new HashMap<>();

        @ExportMessage
        boolean isExecutable() {
            return true;
        }

        @ExportMessage
        Object execute(Object[] args, @Cached IndirectCallNode callNode) {
            if (args.length != 1) {
                /* throw ArityException */}
            Object ast = args[0];
            CallTarget target = lookup(ast);
            return callNode.call(target);
        }

        @TruffleBoundary
        CallTarget lookup(Object hostAST) {
            // you might want to use a better key here for the AST
            CallTarget target = cache.get(hostAST);
            if (target == null) {
                target = new ExternalRootNode(convertAST(hostAST)).getCallTarget();
                cache.put(hostAST, target);
            }
            return target;
        }

        @TruffleBoundary
        TruffleAST convertAST(Object ast) {
            InteropLibrary interop = InteropLibrary.getUncached();
            try {
                List<TruffleAST> truffleChildren = new ArrayList<>();
                Object children = interop.invokeMember(ast, "getChildren");
                for (int i = 0; i < interop.getArraySize(children); i++) {
                    Object child = interop.readArrayElement(ast, i);
                    truffleChildren.add(convertAST(child));
                }
                return new TruffleAST(truffleChildren.toArray(new TruffleAST[truffleChildren.size()]));
            } catch (InteropException e) {
                throw new AssertionError(e);
            }

        }
    }

    static class TruffleAST extends Node {

        @Children
        TruffleAST[] children;

        TruffleAST(TruffleAST[] children) {
            this.children = children;
        }

        @ExplodeLoop
        Object execute(VirtualFrame frame) {
            for (TruffleAST child : children) {
                child.execute(frame);
            }

            // actual PE semantics
            return "";
        }

    }

    static class ExternalRootNode extends RootNode {

        @Child
        TruffleAST child;

        ExternalRootNode(TruffleAST child) {
            super(null);
            this.child = child;
        }

        @Override
        public Object execute(VirtualFrame frame) {
            return child.execute(frame);
        }

    }

Hope this helps. Sorry I cannot give you the answer you probably hoped for.

enikao commented 2 years ago

Thanks for your considerations. When I continued and explored the debugging API, I got the feeling that this is about encapsulation. It also shows that all debugging-related APIs are text- and line-oriented, so there we would need even more change to APIs. (Also the debugging APIs on MPS-side are line-oriented, so I won't get around some "line simulation" in any way.)

To answer your question, I didn't do anything about Host access. Maybe it helps that both the language and the interpretation are initialized from the same Classloader?

I use a ThreadLocal to give access to MPS nodes from Truffle. (I learned from project loom that ThreadLocal is a way around protections that's hard to control.)

TruffleForwarder just sets up a globally accessible ThreadLocal:

public class TruffleForwarder extends ATruffleInvoker { 
  public TruffleForwarder(string languageId, ClassLoader classLoader) { 
    super(languageId, classLoader); 
  } 

  public static final ThreadLocal<SRepository> repository = new ThreadLocal<SRepository>(); 

  public string eval(node<> node, SRepository repo) { 
    node-ptr<> pointer = node.pointer; 
    string ptr = pointer/.toString(); 
    try { 
      TruffleForwarder.repository.set(repo); 
      int size; 
      read action with repo { 
        size = node.descendants.size; 
      } 
      return evalInternal(ptr, size, NodeUriHelper.getUri(node)); 
    } finally { 
      repository.set(null); 
    } 
  } 
}

Calling Truffle (str first contains some URI and than simulated lines to satisfy some debugging code of Truffle):

protected string evalInternal(final string content, int nodeCount, final URI uri) { 
  ClassLoader current = Thread.currentThread().getContextClassLoader(); 
  try { 
    Thread.currentThread().setContextClassLoader(classLoader); 
    StringBuilder str = new StringBuilder(content); 
    for (int i = 0; i < nodeCount; i++) { 
      str.append("  \n   \n"); 
    } 
    Source source = Source.newBuilder(languageId, str, "<input>").uri(uri).build(); 
    ByteArrayOutputStream out = new ByteArrayOutputStream(); 
    PrintStream outPrint = new PrintStream(out); 
    context = Context.newBuilder(languageId).in(InputStream.nullInputStream()).out(outPrint).build(); 
    Value result = context.eval(source); 
    outPrint.println(result.toString()); 
    return out.toString(Charset.defaultCharset()); 
  } catch (IOException | RuntimeException e) { 
    message error e.getMessage(), <no project>, e; 
    return e.getMessage(); 
  } finally { 
    Thread.currentThread().setContextClassLoader(current); 
  } 
}

Inside my language:

protected CallTarget parse(TruffleLanguage.ParsingRequest request) throws Exception { 
  Source source = request.getSource(); 
  string sourcePtr = source.getCharacters().toString(); 
  SNodeReference ptr = SNodePointer.deserialize(sourcePtr.trim(both)); 
  SRepository repo = TruffleForwarder.repository.get(); 
  map<string, RootCallTarget> functions = new linked_hashmap<string, RootCallTarget>; 
  FrameDescriptor descriptor = new FrameDescriptor(); 
  string functionName; 
  CompletableFuture<Node> bodyNodeFuture = new CompletableFuture<Node>(); 
  // this is just a fancy way to say "run in UI thread"
  execute in EDT with repo { 
    node<MainFunction> mpsMain = ptr.resolve(repo) as MainFunction; 
    message debug "mpsMain: " + mpsMain, <no project>, <no throwable>; 
    functionName = mpsMain.concept.conceptAlias; 
    node<Block> mpsBody = mpsMain.body; 
    bodyNodeFuture.complete(new TruffleConverter(mpsMain, descriptor).convert(mpsBody)); 
  } 
  Node bodyNode = bodyNodeFuture.get(); 
  SLStatementNode bodyStatement = bodyNode as SLStatementNode; 
  RootCallTarget main = Truffle.getRuntime().createCallTarget(new SLRootNode(this, descriptor, new SLFunctionBodyNode(bodyStatement), source.createSection(1), functionName)); 
  ...
}

I was rather surprised how well Truffle runtime handles class reloading. MPS reloads classes a lot, and Truffle mostly coped with it without hickup. I can even change TruffleNode implementation classes and they just work after hot-reload.