unifiedjs / unified

☔️ interface for parsing, inspecting, transforming, and serializing content through syntax trees
https://unifiedjs.com
MIT License
4.49k stars 110 forks source link

unify AST types #206

Closed milahu closed 2 years ago

milahu commented 2 years ago

Initial checklist

Problem

xast-parse produces a different AST than hast-parse so i cannot use hast-util-select on xast

part of https://github.com/rehypejs/rehype/pull/112 (trying to use hast tools for xml)

Solution

unify ASTs across all parsers in the unifiedjs ecosystem so that tools like select "just work" on all ASTs

interface Node {
  name: string,
  type: string,
  props: Record<string, any> | null,
  children: Node[] | null, // node is a branch node
  value: string?, // node is a leaf node
  location: { start: number, end: number },
  internal: Record<string, any> | null, // like props, but for unifiedjs internal data
  // example: path to source file of a root node
  isRoot: boolean, // shortcut to detect "context switch" between different ASTs
}

this would also allow to just "plug in" tools like graphQL, or interface with graph databases ...

benefit: ASTs are composable, for example, i can embed a python AST in a html code block. or, i can parse <img src="image.svg">, parse the svg file at src, transform the svg, inline the svg in html

maybe extend the Node type of a parser-generator like tree-sitter or lezer-parser

Alternatives

write N tools for M ASTs ...

which is the opposite of "unified"

"what would pandoc do?"

wooorm commented 2 years ago

Please respect our time by filling out the template, following the support guidelines, and taking some time to think through your questions. I will block you otherwise.

You can use unist-util-select in any unist language. unified is specifically not pandoc, it specifically has different ASTs, being like Pandoc is not the goal. Pandoc is already a very good version of Pandoc.