nixpulvis / oursh

Your comrade through the perilous world of UNIX.
http://nixpulvis.com/oursh/oursh
MIT License
67 stars 6 forks source link

Program Parsing Interface #8

Open nixpulvis opened 6 years ago

nixpulvis commented 6 years ago

The majority of input to the shell is given in the form of text to be interperated as a shell program. To facilitate both the interactive syntax highlighting, and more importantly the semantics of the shell (both interactive and when reading a file), the input should be tokenized and parsed to some form of AST.

There are a few existing parsers out there, mainly: LALRPOP, pest, and the shell specific conch-parser. Before we decide how to parse we need to decide what we're parsing. For example, many interfaces demand a &str while others make use of various kinds of iterators for streaming support.

At the very least, the AST in conch-parser is a good thing to skim over.

nixpulvis commented 6 years ago

I've started work on the LALRPOP implementation. Even if we decide on conch for the POSIX grammar I believe this will be a good choice for the modern grammar.

nixpulvis commented 6 years ago

This will be pretty finalized as I figure out the custom lexer for LALRPOP.

nixpulvis commented 6 years ago

22 will inform most of the open questions here.

nixpulvis commented 5 years ago

Currently the structure returned by a parse can be any kind of Program, but the Program trait doesn't expose the complete AST. We need some form of AST to assist with syntax highlighting. Either we allow for specialized highlighters to view the internal AST, or we standardize features (e.g. commands, arguments, strings, etc) to expose in the program interface under a highlighting feature.

nixpulvis commented 5 years ago

https://github.com/zsh-users/zsh-syntax-highlighting/tree/master/highlighters

nixpulvis commented 4 years ago

6ded9a55dc61504802f119bb0e24fe5755af922f starts some progress on this by adding Unicode support to the lexer.

nixpulvis commented 4 years ago

Must read Unicode Security Mechanisms - Identifier Characters, and a Rust crate implementation at https://github.com/unicode-rs/unicode-xid.