High-level API to embed parglare into other tools

Description

I have created a tool called UniGrammar that transpiles grammars in an own JSON-objects-based DSL into grammars in other DSLs, compiles them into the actual stuff that can be used, if it is needed and generates wrappers to use the parsed trees uniformly. It unifies other aspects too, such as storage and access to compiled grammars (I mean there is a gen-bundle command that compiles a grammar with all the tools and stores the artifacts (and then the bundle can be used without much attention to the tools, UniGrammarRuntime itself detects the fastest (it also stores benchmarks results within the bundle) backend available on user's system)), testing, visualization (thought visualization is backend-specific).

parglare has 2 formats, the source one and a precompiled table serialized into python lists and dicts serialized into JSON. But the precompiled table is generated with a CLI app, not via an API, and is fetched automatically based on path of a source file (the architecture of my runtime is such that the stuff is always loaded from memory because when testing I prefer not to create unneeded files, some users have SSDs and floating gate transistors have limited count of erases before they degrade to the state they are useless). Also JSON may be not the best format to store it, it has some overhead text-based formats have. I may want to replace it for example with CBOR.

So I wonder if it makes sense to

parse grammars not only from files or strings, but also from the stuff that is usually serialized into JSON, so I can parse it myself
compile them not to files, but to that stuff and via API, so I can serialize the stuff myself, and again, without side effects since other tools will be run in the same process after it
provide a convenient interface to trace and visualize them. I have not yet decided how in fact I am going to visualize tham, currently I rely on the compilers own functionality, but since most of them have no, I feel like I will have to accept dot source and wrap xdot (or maybe just use networkx) to show them in this cases. So the most likely under API for visualization I mean an API outputting dot source without any side effects.

API for tracing is tricky one. Most of tools have different kind of tracing, and nkne of them visualizes the trace automatically. For example ANTLR prints into tokens and actions and errors into stdout. I guess for tracing we can use the following very generic interface: just a collection in-memory buffers, each of them has some metadata describing its purpose and format. I.e. just an object with the fields tokens: typing.Optional[str] for tokens, the purpose is to be printed into stdout, actions: typing.Optional[str] is text repr of actions, actions_graph: typing.Optional[str] is a GraphViz graph source, the purpose is to be rendered on screen or into a file. Or we may want to get the actions and tokens in an object-oriented format. I have not yet decided. Anyway, there shouldn't be any side effects, such as direct output into stdout or closing the app.

Do you consider refactoring parglare this way as acceptible?

Also it would be nice to have the reciprocal mapping, I mean transpilation from parglare grammars into UniGrammar ones. Is it better to have it within parglare or within UniGrammar?

igordejanovic / parglare

High-level API to embed parglare into other tools #116

Description