jeff-hykin / better-shell-syntax

💾 📦 ♻️ An improvement to the shell syntax for VS Code
MIT License
50 stars 4 forks source link

Nested zsh style Parameter Expansion flags throws parser #88

Open wdeshazer opened 1 month ago

wdeshazer commented 1 month ago

The code with a problem is:

deserialize_pathOpts_from_file() {
    typeset -grA _pathOpts
    _pathOpts=( "${(Q@)${(z@)"$(<pathOpts_path)"}" )
}

serialize_pathOpts_to_file() {
    "${(j: :)${(qkv@)_pathOpts}}"
} 

With all extensions disabled, the resulting code looks like:

image

To get the parser to highlight correctly I have to append the following:

deserialize_pathOpts_from_file() {
    typeset -grA _pathOpts
    _pathOpts=( "${(Q@)${(z@)"$(<pathOpts_path)"}" )  #""
}

serialize_pathOpts_to_file() {
    "${(j: :)${(qkv@)_pathOpts}}"  #"
} 

image

Interestingly, If I put on a single quote on the deserialize _pathOpts assignment, I get a syntax error indication. Which is not true unless I don't know something about shell grammar. It's just clear that this is not an officially pattern or I should be doing something different somewhere.

image

jeff-hykin commented 1 month ago

Do you know what the name of this syntax is (Q@)? Like what zsh feature its related to

jeff-hykin commented 1 month ago

(note: this comment is unrelated to why the highlighting is failing) This part, in particular the <, is surprising to me that it parses at all "$(<pathOpts_path)" I'll have to check the spec on that

wdeshazer commented 1 month ago

Yes, this is called a parameter flag or parameter subscript. It is available in both bash and zsh. Not sure about sh. 15.2.3 Subscript Flags The particular flag Q you are asking about decreases the quote level by 1. q increases it. it is possible to increase it up to three times (qqq). printf has a similar capability, but a different syntax printf "%q"

I have been looking at it this morning and I see that the variable regex's are don't provsion for them. I think there is overlap with this and issue #74. I don't have a background in Textmate, but I have extensive experience with Perl Regex. I will look at the Textmate side and offer a proposal.

wdeshazer commented 1 month ago

This part, in particular the <, is surprising to me that it parses at all "$(<pathOpts_path)" I'll have to check the spec on that

This is a downstream consequence of missing the variable identification. The quote flag is still open. I have encountered a variety of confusing behavior that I fiddled with iteratively to see if I could reverse engineer the cause. In this case, I think the other is a symptom of the variable identification.

jeff-hykin commented 1 month ago

In terms of fixing the problem. Theres a pattern, related to variable assignment, for detecting an array assignment. I think the array pattern handles named and not-named arrays. For the named arrays, the string pretty much has to be matched with a one-line pattern (not pattern range) due to parser limitations. One line patterns can't handle nested stuff, like nested strings inside of string interpolation. Instead a pattern range has to be used with a start quote and end quote.

Slight hiccup tho, Textmate prioritizes long matches. So matching one-line string (one big chunk) compared to matching just the starting quote, will cause the whole-chunk to "win".

That "win" scenario is just a warning, idk if thats even happening here. It might simply be that the pattern-range version of the string pattern isnt even included in the array-literal range.

wdeshazer commented 1 month ago

Maybe you already realize this, but the best way to read this is as nested parameters. First, you have: "$(<pathOpts_path)" with returns a string that will become an associative array which I will call pathOpts_str since I know what I call it.

It then expands to become ${(z)pathOpts_str} which the z is a flag that will iterate parse the contents of consistent with the parsing algorithm of the zsh command-line. Thus it becomes ${pathopts_quoted[@]}", which then gets passed through Q@. The @ I think is just syntactically preferred by the community, but not necessary. I haven't figured out under what conditions it is. Either way, each element is dequoted iteratively.

Two good references for this are: [Zsh Native Scripting Handbook]{https://wiki.zshell.dev/community/zsh_handbook} and Zsh Cheat Sheet

wdeshazer commented 1 month ago

"One line patterns can't handle nested stuff, like nested strings inside of string interpolation. Instead a pattern range has to be used with a start quote and end quote."

I understand. Greedy vs non-greedy matching. This can actually be overcome by clever application of look-ahead/look-behind assertions I have a lot of experience in this and have been itching to build a tokenizer, so I am looking forward to helping you out with this one. Let's see if we can crack it. I'm almost sure that Regex can hack-it -although I know (some people who know-lol) for a fact that they are not Turing complete, meaning they can't reconstruct all logically valid syntaxes. I don't know when we would hit that limitation, but I believe these constructs are well within it's wheel-house.

By the way, what limits the parser to Textmate? Is that a VSCode thing?

jeff-hykin commented 1 month ago

This part, in particular the <, is surprising to me that it parses at all "$(<pathOpts_path)" I'll have to check the spec on that

This is a downstream consequence of missing the variable identification.

I think Theres a bit of miscommunication I'll try and clear up. I don't think the parameter expansion or parameter flag or (< is related to #88. I was just curious about the parameter flag/subscript (thanks for the info on that!)

For the "$(<pathOpts_path)" I'm assuming that this would be valid on its own, echo "$(<pathOpts_path)".

wdeshazer commented 1 month ago

image

jeff-hykin commented 1 month ago

Greedy vs non-greedy matching. This can actually be overcome by clever application of look-ahead/look-behind assertions

Sort of. Yes textmate is greedy, but its separate from the regex greedyness. Like (a|ab) in regex would match just a first and be happy AFAIK (wouldnt try "ab" unless something else in the pattern failed). But textmate its more like [ Pattern(/a/), Pattern(/ab/) ] would match "ab" despite the fact that "a" matched first.

I say "kinda" because yeah, we still use a lot of lookarounds to solve it on the textmate side.

Also seeing as you're a regex expert, that will help a lot. In terms of regex hacking, there is one warning. In theory, with enough recursive regex, matching a nested string with a one-line textmate pattern is possible. Sadly I spent a lot of effort on that once for a different language only who realize that textmate will only tag the last (most-inner) part of a recursive regex pattern. So even if the pattern is matched correctly, its not tagged correctly. Ive got an issue on VS Code textmate about it, but I think you me and @ redcmd are the only people in the world who might care about the issue. That said, even with broken scopes/tags sometimes the fact that the pattern matches correctly is enough. It would be enough in this case, but the massive amount of effort it would take to make a recursive nested string pattern (a pattern that needs to contain the entire grammar thanks to subshell interpolation) would be insane for just fixing this one bug and like 2 other non-cascading bugs.

what limits the parser to Textmate? Is that a VSCode thing?

Yep. Other editors are limited to Textmate too.

The alternative is the awesome Tree Sitter parser, which would never even run into this problem in the first place. Atom used it, NeoVim uses it. I use it for parsing tasks.

Fun fact though. Bash is one of the few languages (I think Perl is another) that is impossible to statically parse perfectly. There can be runtime changes to the syntax thanks to, at minimum, aliases. So even the tree sitter can't always parse bash. Gotta run it to parse it (sometimes)

jeff-hykin commented 1 month ago

image

Yep, cool. Wouldnt be surprised if that becomes #90 on this grammar

wdeshazer commented 1 month ago

It took me forever to find this reference. I expected it to be in the redirection section of the manual, but wouldn't you know it was in the Bash documentation on Command Substitution. What's funny about that is that I searched the document for "(<". If that didn't give it away. I will start with what I know, which is the regex side and then maybe we can work together to map it to Textmate if it is possible. I gotta run right now, but I'll be back.

wdeshazer commented 1 month ago

Actually, before I run, what is the order of operation when it comes to pattern recognition. Does the engine do sweeps based on the nodes within repository (I am referring to the json in autogenerated/shell.tmLanguage.json), does it do some compound search pattern or some other algorithm altogether? I want to make sure that we have regex patterns that don't conflict. Now I will talk to you later.

wdeshazer commented 1 month ago

Ok. I have some really good solutions that need to be rigorously tested. We also should discuss, what is and is not achievable with Textmate. I did find this good Textmate reference, which had a link to this one which suggests to me the Textmate grammar should be as rich as Perl's, but who knows. We should come up with some more rigorous patterns but they worked with all of my scripts, which are pretty aggressive:

\b(\w+)(?:=)  # Variables anything with assignments

(?:")([^"/]+)(?:") # Any quoted non-path

(?:\$\{)(\w+)(?:\}) # Any simply $ marked variable

(?:\$\{)((?:#)|\w)+(?:\}) # Any simply marked variable including with counting

(?:\$\{)([^ ]+)(?:\})  # Full Variable pattern Excluded everything but the space

(?:\$\{)(?:\()(@|\w+)(?:\))(\w+)(?:\}) # Flagged Variable

# Stack overflow to the Rescue [Regular Expressions to Match Balanced Parenthesis](https://stackoverflow.com/posts/35271017/revisions)
\((?:[^)(]|\((?:[^)(]|\((?:[^)(]|\([^)(]*\))*\))*\))*\) Nested Parenthesis *Wow!!!!

Shell_variable_identification_regression.zsh

Shell Variable Identification Regression (Note this code isn't working fully. I was developing when I got sidetracked ```zsh #!/bin/zsh # _pathOpts startup is in $BIN for me $SOME_ROOT/bin pathOpts_path() { local fname="$BIN/._pathOpts.sh" [[ -e $fname ]] && echo ${fname} } < $(pathOpts_path) &2> /dev/null } # Manage _pathOpts based on flags edit_PathOpts() { local flag key path_value flag="$1"; key="$2"; path_value="$3"; while [[ $# -gt 0 ]]; do case $flag in "--add") _pathOpts[$key]=$path_value; shift 3 echo "Added path option: $key -> $path_value";; "--remove") unset _pathOpts[$key]; shift 2 echo "Removed path option: $key";; "--reset") mv "$_pathOptsPath" "$_pathOptsPath_$(gdate +%Y%m%d_%H%M%S)" unset _pathOpts initialize_PathOpts; shift 1 echo "Reset _pathOpts to default";; *) echo "Unknown flag: $flag" return 1 ;; esac done save } # Main function main() { initialize_PathOpts local debug=false [[ $1 == "--debug" ]] && shift && debug=true [[ $1 == "--_pathOpts" ]] && shift && edit_PathOpts "${(P)1[@]}" && return [[ $debug == true ]] && set -o xtrace find_folder "$@" [[ $debug == true ]] && set +o xtrace } main "$@" ```

Assignment Identification:

image

Quote Identification non-Path

image

$-wrapped Variable identification

image

Full Variable pattern - Excludes only spaces

image

Flagged Variable

image

image

image

Mother of all regex's - Nested Parenthesis - Thank you stack overflow

image