Allow reuse of the `main` highlighter's semantic parser

Thesola10 commented 2 years ago

I think it would make it easier to write custom highlighters into third-party plugins if the main highlighter's state machine was exposed as a "framework" to implement additional highlight rules into semantic blocks already recognized by this plugin.

An example of this would be thesola10/zsh-comma-assistant#1, where the highlighter for arg0 would be overriden with additional logic to highlight certain commands normally written off as "not found".

danielshahaf commented 2 years ago

What's the difference to #327?

What kind of API do you envision?

WDYT of for k in ${(k)ZSH_HIGHLIGHT_STYLES}; do ZSH_HIGHLIGHT_STYLES[$k]=$k; done; _zsh_highlight and then inspecting $region_highlight?

Would it be better to just integrate with command_not_found_handler natively? Something like this (caveats: no docs, no tests, doesn't check if the handler exists): —

diff --git a/highlighters/main/main-highlighter.zsh b/highlighters/main/main-highlighter.zsh
index 8881dfa..59f760e 100644
--- a/highlighters/main/main-highlighter.zsh
+++ b/highlighters/main/main-highlighter.zsh
@@ -115,6 +115,7 @@ _zsh_highlight_main_calculate_fallback() {
       precommand arg0
       hashed-command arg0
       autodirectory arg0
+      bikeshed arg0
       arg0_\* arg0

       # TODO: Maybe these? —
@@ -1090,6 +1091,9 @@ _zsh_highlight_main_highlighter_highlight_list()
                             return 0
                           fi
                           _zsh_highlight_main__stack_pop 'R' reserved-word
+                        # use a subshell because it's a documented precondition of the handler (see zsh manual)
+                        elif (command_not_found_handler "$arg"); then
+                          style=bikeshed
                         else
                           if _zsh_highlight_main_highlighter_check_path $arg 1; then
                             style=$REPLY

Thesola10 commented 2 years ago

Would it be better to just integrate with command_not_found_handler natively? Something like this (caveats: no docs, no tests, doesn't check if the handler exists):

Well, my command_not_found_handler actually has to run the entire command in comma, so I really don't think that's a good idea, especially in a highlighter.

That would also cause issues with the slower stock handlers, such as the ones used by pacman or nix-index.

Thesola10 commented 2 years ago

I have looked into how the main highlighter handles highlighting, and it seems like it uses a preliminary parser. An idea for an API would be to run this parser outside of any highlighter, then provide an array of states and real offsets matching the (z)BUFFER array.

If, for instance, I wanted to write a highlighter that handled commands, I'd look in the state array for :command:, then for each index, perform some check against the corresponding index in (z)BUFFER, and use the same index on the offsets array to apply the highlight.

Mockup:

_zsh_highlight_highlighter_comma_paint() {
  args=(${(z)BUFFER})
  for (( i = 1; i <= $#WORDS; i++ )); do
    if [[ "${WORDS[i]}" == ":command:" ]] && grep -Fx "${args[i]}" "~/.cache/nix-index/cmds"
    then 
      _zsh_highlight_add_highlight ${OFFSETS[i]} ((OFFSETS[i] + ${#args[i]})) comma:cmd
    else 
      _zsh_highlight_add_highlight ${OFFSETS[i]} ((OFFSETS[i] + ${#args[i]})) unknown-token
    fi
}

Basically, for BUFFER=" hello; world", WORDS=(":command:" ":reserved:" ":command:") and OFFSETS=(1 6 8)

danielshahaf commented 2 years ago

Understood.

I think you can basically get that already, by doing what the test suite does (see a830613467af83971270299186a866ef6fa9f58f and #287) to validate test expectations, which are basically exactly what you give in your example:

https://github.com/zsh-users/zsh-syntax-highlighting/blob/56b44334617aa719e8e01bb5e00d3e176114a036/highlighters/main/test-data/bang-pipeline.zsh#L34-L38

That is, I think you can simply run the main highlighter in a particular way in order to make it behave as you proposed an independent parser would behave. Is this a solution or merely a workaround? And why?

As to my previous suggestion —

WDYT of for k in ${(k)ZSH_HIGHLIGHT_STYLES}; do ZSH_HIGHLIGHT_STYLES[$k]=$k; done; _zsh_highlight and then inspecting $region_highlight?

— I thought it would cover this, and for BUFFER=" hello; world" would give you typeset -a region_highlight=( '1 6 unknown-token memo=zsh-syntax-highlighting' '6 7 commandseparator memo=zsh-syntax-highlighting' '8 13 unknown-token memo=zsh-syntax-highlighting' ), but I haven't been able to get this to work in a toy example. Not sure why.

Thesola10 commented 2 years ago

That is, I think you can simply run the main highlighter in a particular way in order to make it behave as you proposed an independent parser would behave. Is this a solution or merely a workaround? And why?

I would like a solution that works for independent highlighters, since I don't think it's sustainable to just accept PRs to upstream just to distribute a particular highlighter that's only relevant within a specific context, if that's what you mean. Also, what way are you thinking of? Maybe I could try to use it in my highlighter.

Also, wouldn't your suggestion be for k in ${(k)ZSH_HIGHLIGHT_STYLES}; do ZSH_HIGHLIGHT_STYLES[$k]=$k; done; _zsh_highlight_highlighter_main_paint? Also that just wipes all colors because it hijacks ZSH_HIGHLIGHT_STYLES, but the data returned does look useful, even though I'm not a fan of running the main highlighter twice.

danielshahaf commented 2 years ago

I would like a solution that works for independent highlighters, since I don't think it's sustainable to just accept PRs to upstream just to distribute a particular highlighter that's only relevant within a specific context, if that's what you mean. Also, what way are you thinking of? Maybe I could try to use it in my highlighter.

The test suite runs the highlighter being tested in a way that causes it to add to $region_highlight not highlight strings (such as fg=red) but z-sy-h style names (such as unknown-token). A third-party highlighter could run the main highlighter in the same way and then examine the resulting region_highlight in order to get the values https://github.com/zsh-users/zsh-syntax-highlighting/issues/859#issuecomment-1039246524 identifies.

There isn't anything in this proposal that requires third party highlighters to send PRs to us; on the contrary, it's a design that could be used even by proprietary, non-open-source highlighters.

Also, wouldn't your suggestion be for k in ${(k)ZSH_HIGHLIGHT_STYLES}; do ZSH_HIGHLIGHT_STYLES[$k]=$k; done; _zsh_highlight_highlighter_main_paint?

Good catch.

Also that just wipes all colors because it hijacks ZSH_HIGHLIGHT_STYLES,

So call it from a function and use local -A ZSH_HIGHLIGHT_STYLES so the changes don't affect the rest of the world.

but the data returned does look useful, even though I'm not a fan of running the main highlighter twice.

Fair point. It would be nice if we can run the parse once and then make the results available both to the main highlighter and to third-parties. Can we assume third parties will run after the main highlighter?

Thesola10 commented 2 years ago

Can we assume third parties will run after the main highlighter?

Would that be necessary? After all, if we do have a unique parser "at the top", the main highlighter just ends up being a giant match machine, and it's all a question of priority.

Of course, you would want to set main first so your other highlighters don't get overwritten, but I don't think that would break anything per se.

There isn't anything in this proposal that requires third party highlighters to send PRs to us

So you were talking about what your suggestion code does. Sounds good, still not a fan of running main several times.

danielshahaf commented 2 years ago

@Thesola10 I'm trying to figure out a workaround that you can use right now. Figuring out a longer-term solution is a fair topic too, of course.

Thesola10 commented 2 years ago

I just noticed, but when a command is not found, it is given the unknown-token highlight, yet that doesn't distinguish it from e.g. a misplaced parenthesis, does it? Our "parser" would have to return the semantic role, not whether it is right or wrong.

(not that there's a significant risk of my script accidentally matching a stray closing paren, but we should avoid ambiguities that could cause issues later on)

danielshahaf commented 2 years ago

As you say. That's a known issue (#695); patches are welcome :-)

revanthkiran96 commented 1 year ago

hello guys, i was new to opensource i can't understand the issues can anybody help me please

Thesola10 commented 1 year ago

hello guys, i was new to opensource i can't understand the issues can anybody help me please

it's a niche issue about allowing other zsh plugins (like the one I'm working on) to modify the syntax highlighter's behavior without having to reimplement its logic entirely. In my particular use case, I want to change unknown commands to blue when they exist in a specific database because my plugin can then automatically install and run them.

Thesola10 commented 10 months ago

I just looked into _zsh_highlight_highlighter_main_paint and it looks as though it uses _zsh_highlight_main_highlighter_highlight_list to parse the command line itself, which is already much better than the ZSH_HIGHLIGHT_STYLES hijack (no offense)

This means it would be much simpler to make the parser global, by running that function once per highlight pass and storing its result in a variable available to all parsers.

Thesola10 commented 8 months ago

This is only tangentially related to this issue, but I'd appreciate some help with https://github.com/Thesola10/zsh-comma-assistant/issues/7

this is very weird and likely an issue with _zsh_highlight_main_highlighter_highlight_list somehow, but I'm still very confused.

zsh-users / zsh-syntax-highlighting

Allow reuse of the `main` highlighter's semantic parser #859