syntax-tree / unist

Universal Syntax Tree used by @unifiedjs
https://unifiedjs.com
875 stars 19 forks source link

Would unist make a good programming language AST format? #10

Closed anko closed 7 years ago

anko commented 7 years ago

Hi again @wooorm (and other AST enthusiasts),

I develop an experimental programming language called eslisp, which is basically a JavaScript syntax optimised for code-modifying macros that let users add language features. It might be helpful to think of it as a programming language processing tool.

Eslisp's current AST representation contains exactly the same information as Unist, right down to location data, but currently organised differently. I was writing my own tools for reading and modifying it, then realised I'm basically duplicating Unist utilities.

I thought I'd open a dialogue before I start "hammering on screws" and making it a dependency. Have you considered programming languages as a Unist use-case? Is this a sane thing to be doing, long-term?

wooorm commented 7 years ago

Hi again! đź‘‹

Whether Unist works for programming languages? I think so, yes.

I haven’t really investigated that, but I know @JDvorak is working on recode, which is remark/retext/rehype for JavaScript. I think that’s very early stages though?

I thought I'd open a dialogue

Yes!

before I start "hammering on screws"

Like what? What are the things you’re considering? What downsides do you see? What alternatives are there?

JDvorak commented 7 years ago

Hell yeah!

UNIST's basic structure is totally applicable to programming languages. It is just a super verbose data structure for relating nodes and their children explicitly, and consistently (I'm looking at you estree). Because of its readability and programmatic properties, I try to make my user-facing trees Unist shaped where possible so that I can rely on the existing tree toolchain, and build upon it. While, interiorly, I might actually build temporary tree structures that have certain performance characteristics. Like, for instance, my fledgling quasi tree adjoining grammar parser is currently written in an alien tree-tongue, but in subsequent revisions will have a Unist format to organize dependencies amongst transforms and enable userland to better use the tool. Unfortunately, I am not a great open source contributor since I cannot put aside time until next month to return to working on it, and even then, only weekend. ;-; but this is all off topic, @anko, the big take away is that yes, it is super cool for anything tree shaped. It is sane.

anko commented 7 years ago

Whether Unist works for programming languages? I think so, yes.

:+1::sparkles:

What are the things you’re considering? What downsides do you see?

Everything around Unist is currently natural-language-related, so I thought I'd probe for unspoken assumptions just in case. I've read around and see no technical problems.

What alternatives are there?

None I know, except my ad-hoc format.

recode

Cool! I'll be following.

Quick comparison regarding our uses of Unist: Recode invents a new Unist-compatible AST format for JavaScript, on the same level as ESTree (which I currently use for JS ASTs, because the current JavaScript-modifying tools ecosystem targets it). I'm considering Unist for an S-expression representation, for processing before it's translated to JavaScript.

If it works for structures as complex as JS, it'll definitely work for S-expressions.


I'll try it and report back.

wooorm commented 2 years ago

Years later, btw, an estree-like AST (esast) now exists: https://github.com/syntax-tree/esast. It drops the “children” thing that unist has for markup languages, which is a bummer, but on the other side, it does add unist positions and a couple of other things, which is useful!