Open nixpulvis opened 6 years ago
I've started work on the LALRPOP implementation. Even if we decide on conch for the POSIX grammar I believe this will be a good choice for the modern grammar.
This will be pretty finalized as I figure out the custom lexer for LALRPOP.
Currently the structure returned by a parse
can be any kind of Program
, but the Program
trait doesn't expose the complete AST. We need some form of AST to assist with syntax highlighting. Either we allow for specialized highlighters to view the internal AST, or we standardize features (e.g. commands, arguments, strings, etc) to expose in the program interface under a highlighting feature.
6ded9a55dc61504802f119bb0e24fe5755af922f starts some progress on this by adding Unicode support to the lexer.
Must read Unicode Security Mechanisms - Identifier Characters, and a Rust crate implementation at https://github.com/unicode-rs/unicode-xid.
The majority of input to the shell is given in the form of text to be interperated as a shell program. To facilitate both the interactive syntax highlighting, and more importantly the semantics of the shell (both interactive and when reading a file), the input should be tokenized and parsed to some form of AST.
There are a few existing parsers out there, mainly: LALRPOP, pest, and the shell specific conch-parser. Before we decide how to parse we need to decide what we're parsing. For example, many interfaces demand a
&str
while others make use of various kinds of iterators for streaming support.At the very least, the AST in conch-parser is a good thing to skim over.