mesalock-linux / mesabox

A collection of core system utilities written in Rust for Unix-like systems (and now Windows)
BSD 3-Clause "New" or "Revised" License
137 stars 19 forks source link

sh: switch to LALRPOP? #34

Open Arcterus opened 6 years ago

Arcterus commented 6 years ago

I would vastly prefer to use something like LALRPOP or pest instead of nom, but right now neither support arbitrary bytes as input. There is this issue for LALRPOP and this one for pest that deal with that problem, but I'm not sure if it's fine to just say that invalid UTF-8 paths can't be given directly as arguments for now (they'd still work if a glob was given where the expanded part was invalid UTF-8). Switching to either LALRPOP or pest should help improve the error messages and make it simpler to fix some of the current parsing issues.

As a side note, this is actually the same issue that prevented me from just using pest in the first place.

mssun commented 6 years ago

I think it's OK to say that "invalid UTF-8 paths can't be given directly" for now and use LALRPOP or pest (pest looks good since it uses PGE for parsing). Then, let's explain our usages to the pest/LALRPOP authors and push the implementation of the "matching bytes" feature.

Arcterus commented 6 years ago

So, I ended up rewriting the parser without a library using iterators (retrieved from OsStr) directly. Currently we use EncodeWide on Windows and std::slice::Iter<u8> on Unix. While I haven't tested on Windows yet, I believe this solution should let the parser function on Windows. We still need to figure out how to deal with forking on Windows (e.g. for subshells), but it should be possible to at least enable some sort of shell script checker on all platforms now.

I imagine the performance can be vastly improved as I just did a simple port from nom. If this proves too difficult, we can switch to LALRPOP/Pest at a later date (once they at least support parsing u8 streams).

The rewritten parser is in the sh-parser branch. The REPL doesn't support multi-line inputs right now, but for the most part things seem to function better than before. I am at this point considering just integrating the changes into master as (IMO) a good REPL is less important than good scripting support.

Arcterus commented 6 years ago

BTW the code from sh-parser has now been added to master.

mssun commented 6 years ago

OK, good. Let's use the handwritten parser first and consider LALRPOP/Pest later (if we cannot handle it properly).