grain-lang / grain

The Grain compiler toolchain and CLI. Home of the modern web staple. 🌾
https://grain-lang.org/
GNU Lesser General Public License v3.0
3.26k stars 113 forks source link

stdlib/runtime: Implement a basic parsing library #2100

Open spotandjake opened 5 months ago

spotandjake commented 5 months ago

As mentioned in #2099, in the stdlib there are multiple cases where we have needed to and will need to implement parsers such as currently for url decoding, filepaths and Json, and in the future we might want to support things like TOML, Yaml and possibly other things. It would be useful if we standardized the way we do parsing in the stdlib to make maintaining these libraries easier, this would also help maintain best practices. As we probably don't want this library to be accessible from the stdlib it would probably make sense to put this in the runtime. Alternatively maybe we want to consider adding an stdlib/helpers path for things like this. My thoughts for this library are a very low level parsing framework similar to whats in the Json library, something like a full parser generator seems overkill.

saolof commented 3 months ago

What is the class of languages that the parsing framework needs to be able to parse?

If the scope is to have an extensible system whose main purpose is to parse configuration languages, restricting the scope to well-matched visibly pushdown grammars (with minor conveniences to translate leading whitespace to pushdown/pop) should make it possible to parse all languages mentioned here.

It would offer non-backtracking parsing without requiring complex grammar analysis like LR or LL parsers, or requiring backtracking like PEGs, by just restricting the scope of what the module is supposed to do. The fundamental idea is to use stack of finite state automata similar to what you would use for parsing a regex, and the main question is whether you want to support nondeterministic grammars (stack of thompson NFA to handle choice with lookahead, produce output on stack pop) or whether you take the predictive/LL1 ones (stack of DFAs, produce output as you parse, bracket delimited recursion makes it trivial to write a deterministic choice operator that just uses the leading token of each rule). The latter may still be enough to parse the mentioned languages while being straightforward to write a combinator package for.