codeblock syntax highlighting delegation

blaenk commented 10 years ago

It'd be very nice if we could rig something up so that codeblocks get highlighted with the appropriate syntax file. vim-markdown has something like this. I think it works similar to how you currently delegate latex highlighting to latex, and yaml to yaml. It'd be nice to have this, at the very least if it were opt-in.

We already have pandocDelimitedCodeBlockLanguage which contains the codeblock metadata. There are mainly two formats:

``` python
~~~ {.python}

I think it should be very easy to get the language out of the first one, since it's just one word. The second case can be handled by parsing the first class listed. In the worst case we can have something like what vim-markdown implemented, which is essentially a list of allowed languages and then we can see if any one of those is present in pandocDelimitedCodeBlockLanguage and then use that.

This would of course be as straightforward as possible; if it can't find the language then it'll continue doing regular general highlighting as is currently done.

fmoralesc commented 10 years ago

When you have embedded codeblocks though, the syntax highlighting is already a giveaway that it's different/separate from the document, add to that the lambda and language above it and the gap at the bottom from end-delimiter-concealment and it stands out well enough without an extra color there.

Yes, but sometimes you can have empty lines within codeblocks , and since we hide the delimiters, we might have issues determining what is text and what isn't (for example, lisp highlighting is very light, most text would display as Normal).

I'm open to adding an option to disable codeblocks fill highlighting (I was actually doing that now) and also for changing the default. I thought Special might be a better option, but the end result, sadly, depends on the colorscheme (in molokai Special looks nice, in solarized not so much). That's why I tried to choose semantically coherent highlighting, it's up to the colorscheme to make visual sense.

blaenk commented 10 years ago

Yeah, I understand your rationale. My preference would be for me to be able to leave the regular codeblocks alone (so that they're all one color and stand out more) and able to disable codeblock embed fill highlighting or at the very least have a different name for it so we can change it ourselves.

Also see my question about the Conceal color. I loved the way it was before, this new way doesn't seem to afford any customizability.

Finally, I think we should get working on the syntax sync issue. It's definitely fixed in my own version (in the codeblock-highlight branch), so maybe we can determine why it works there. I think it involved defining a more stringent region pattern, but I'm not sure.

I'll be away for about 1-2 hours, but can still communicate while I'm out.

fmoralesc commented 10 years ago

Sorry, I reverted that change about Conceal on master.

blaenk commented 10 years ago

Incidentally, I've been thinking that when we add support for disabling certain concealments #4, that we consider treating the codeblock delimiters separately. I can imagine (and even have considered it myself) people wanting to disable concealment of the end delimiter, to make it clearer where the end is, but preserve the nice lambda concealment for the start delimiter.

blaenk commented 10 years ago

I really can't think of why the syntax sync is present in your version. I've tried playing around with it and haven't been able to determine the cause.

I'm also observing a regression with scala. I don't know how or why, but look at the scala document and you'll see that the codeblock on line 49 has gone all weird. As soon as the [ token is reached, it begins the scalaConstructorSpecializer group and ends up highlighting the rest of the code the same way (and grouped the same way).

Also, try going to the middle/bottom of the document, then scroll up using just k, you'll notice that the scrolling is very sluggish and the file will lose highlighting intermittently while scrolling. I don't scroll this way, and I'm sure most vim users don't either, but it's very likely indicative of underlying performance issues with these patterns.

None of these three things happens in my codeblock-highlight branch. I understand that you wanted to keep the code shorter/smaller, but it seems pretty unnecessary to start from your current pattern since I've already painstakingly solved these issues.

What I suggest is that we instead adopt the rules I developed in my branch, and if necessary start trimming them down little by little, so that we may more clearly determine what the breaking point is.

I'm gonna start attempting this I suppose.

blaenk commented 10 years ago

The scala "regression" was due to the mistake that I added 'scala' in pandoc.vim as well as in my vimrc. Even then, I expected pandoc.vim to "merge" the two lists so that if something appeared in both, there wouldn't be duplicates. I imagine there were duplicates, and so this somehow caused a problem with the highlighter. We'll probably want to find a way to handle this scenario.

fmoralesc commented 10 years ago

Fine by me. As I said, this was mostly a proof of concept.

The scala regression was a due to a mistake on my part of taking expand not to add repeated items to lists (when extending dictionaries, my assumption holds by default). We'll have to check for existence in the list with get()and then add.

blaenk commented 10 years ago

Yeah, I checked the docs for some sort of merge function for lists but yeah it only seems to exist for dictionaries.

By the way, I think I can say with confidence that the scrolling speed is dependent on how many languages we define: the more languages, the more it lags. I verified this by leaving it as only 'ruby' and 'scala'.

Ideally we would have no languages on by default and only enable them as they were needed, if we could have a background process that would notice that we entered a ruby codeblock and it'd generate the ruby highlight rules, or something that would also work is if it would observe the header YAML metadata for an option like:

languages: haskell, scala

And then generate the rules accordingly. However, I've heard it's not generally possible to do this kind of thing. I think what would be second-best would be to scan the YAML metadata at 'launch' as soon as the buffer is open and scans the languages option as above, and generate only those languages. I think this would be a decent compromise, so that I could add the languages I want to the metadata, close the buffer and re-open it. But I don't know how or if this kind of thing is possible.

The reason I'm suggesting these features is because having a global languages list is clearly too loose, especially if you work with many different languages. If you have one post about scala, it'd be nice to have it highlighted but in the majority of the other documents you'd be paying the penalty of that extra language in the languages list.

Finally, I think I've managed to track down the breaking point for triggering the syntax syncing. Take a look at my codeblock-embeds branch. I got rid of the isk "fix" by the way.

So I was simplifying the rules and I noticed that pandocDelimitedCodeBlockStart_ and pandocDelimitedCodeBlockEnd_ are both identical to the original pandocDelimitedCodeBlockStart and pandocDelimitedCodeBlockEnd_. So I started by getting rid of the Start_ specific one and made the blocks now refer to the generic Start group. I tested it and everything was fine.

Then I started getting rid of the language specific pandocDelimitedCodeBlockEnd_. This one contained a containedin=pandocDelimitedCodeBlock_ . lang argument, so when I got rid of it I instead added pandocDelimitedCodeBlockEnd to the contains= list of the language-specific CodeBlocks_. I opened up the file and tested it, bam triggered the syntax sync bug consistently.

Hoping you have some insight. Let me know if my explanation of my process was confusing.

blaenk commented 10 years ago

Just pushed my work to my codeblock-embeds branch, pre-breaking point (after removing the start rule) and post-breaking point (after removing the end rule). HEAD is the breaking point, and you can reset --hard d321e03 to go to the pre-breaking point if you wanna see the difference, then just reset --hard b16941997 to go to the breaking point.

blaenk commented 10 years ago

Again, using this test document with zRG

fmoralesc commented 10 years ago

Then I started getting rid of the language specific pandocDelimitedCodeBlockEnd. This one contained a containedin=pandocDelimitedCodeBlock . lang argument, so when I got rid of it I instead added pandocDelimitedCodeBlockEnd to the contains= list of the language-specific CodeBlocks_. I opened up the file and tested it, bam triggered the syntax sync bug consistently.

I think the end=clauses of these regions cause some overlap and trigger the issue. As you can see in my latest commits in the testing branch (codeblock-embeds), I even put an entirely arbitrary clause there. Ideally, the region that contains the language shouldn't contain pandocDelimitedCodeBlockEnd at all. At least, that's what I think could have triggered the issue with my rules, I suppose sth similar happened with yours.

blaenk commented 10 years ago

Actually I was referring to pandocDelimitedCodeBlockEnd, not the end= rules. I figured that would be confusing.

But anyways, I figured it out! Getting rid of contained in the pandocDelimitedCodeBlockEnd solved the issue. Everything works perfectly fine, no sync bug or anything.

It's pretty condensed now. I'm finally going to attempt to get rid of the one last rule so that it's just one rule like you had originally.

fmoralesc commented 10 years ago

Got it ;)

Great then. Anyway, I think we should disable this feature by default for performance reasons, as you noticed. The application of the syntax rules can be made asynchronously, but we should research how to send commands to a specific buffer first.

blaenk commented 10 years ago

Alright, I reduced it to a single pattern now.

for l in s:pandoc_enabled_codelangs
    unlet b:current_syntax
    exe 'syn include @'.toupper(l).' syntax/'.l.'.vim'
    exe "syn region panodcDelimitedCodeBlock_" . l . ' start=/\(\_^\(\s\{4,}\)\=\(`\{3,}`*\|\~\{3,}\~*\).*' . l . '.*\n\)\@<=\_^/' .
          \' end=/\_$\n\(\(`\{3,}`*\|\~\{3,}\~*\)\_$\n\_$\)\@=/ contained containedin=pandocDelimitedCodeBlock' .
          \' contains=@' . toupper(l)
    exe 'hi link pandocDelimitedCodeBlock_'.l.' pandocDelimitedCodeBlock'
endfor

The scala post that consistently triggered the bug is perfectly fine. However, I tested it with this document and zRG is fine, but if you slowly C-U up, after a few screens around line 250 everything goes gray, grouped as cComment.

It also happens in HEAD^ though. I'm gonna keep going back to see if it's a regression or it's something that wasn't covered to begin with.

blaenk commented 10 years ago

It happens consistently as soon as I pass line 241. As soon as I go past it (upwards) everything becomes grouped as cComment. As far as I know, I don't even have c-style comments there.

blaenk commented 10 years ago

Of course, this doesn't happen fi I remove cpp from the list of languages. I'm curious as to why it's evne getting triggered though.

blaenk commented 10 years ago

Actually, I just noticed I have instances of:

**somedirectory/**

Somehow that must be getting treated as an opening C comment. The problem is this shouldn't be happening as far as I know, it should be confined within a cpp codeblock (if there were one, which there isn't).

fmoralesc commented 10 years ago

My guess is it happens because cpp.vim includes c.vim and then resets b:current_syntax, which pollutes the highlighting.

blaenk commented 10 years ago

Good idea, but I tested it in a separate file and straight up lines like:

/* testing something */

didn't have any effect, they weren't highlighted or anything.

Also, when it becomes 'commented', I did :syn sync fromstart and everything highlighted correctly, so I worry it might be that sync bug again. I tried going all the way back to the first commit of this process (of reducing the patterns) and they all exhibit this issue.

fmoralesc commented 10 years ago

It is a sync issue. When it goes back in the buffer, vim loses context and applies any rule that matches. In this case, cComment. It doesn't happen in the other file you are testing because the highlighter state is different.

blaenk commented 10 years ago

Shame, I thought I had fixed the sync issue :( I'm out of ideas for that problem. I guess I'll be fine enough with a :syn sync fromstart bind. Also, this wouldn't be a problem if cpp weren't in the languages list, since I don't even have cpp in this post.

So I recommend we make it an opt-in list like you said (off by default; no default languages), then it shouldn't be too much of a problem.

There's *syn-sync-grouphere* and *syn-sync-groupthere* but I don't know if they would help.

EDIT: I meant *syn-sync-fourth*.

fmoralesc commented 10 years ago

I looked at that, I think that would be the way to go, but I have to understand it better first. I think that falls squarly under optimization, and I would prefer to leave it till last.

blaenk commented 10 years ago

I fixed it!!! Your last comment made me ask myself why c.vim thinks it's free to search up the document in general (outside the scope of the codeblock region, since the codeblock region contains c.vim). I was looking at c.vim and I noticed that it does some sort of syn sync manipulation, like looking in the past 100 lines for the beginning of a comment or something. Then I remembered that I had read in the docs that you could 'clear' all of the syn sync declarations with syn sync clear.

So I went to the bottom of pandoc.vim and changed it to be like this:

syntax sync clear
syntax sync minlines=100

And it works flawlessly!! You can look into it to see if it's good to have here or not, but my rationale for why it's ok to have here is that we don't need syntax sync manipulations from embedded languages since I imagine they're an optimization like you said, usually meant for huge files of just that particular language, that's why that post was being considered a comment because c.vim felt it was free to search upwards for a possible beginning of comment, which I had as I described above: a slash for a directory name followed by strong markup.

So I think it's perfectly fine if we clear all syn sync declarations at the end that any includeed languages might have registered. Like you said, syn sync is more of an optimization and in this case it's hurting vim-pandoc-syntax, which shouldn't be allowed for the sake of optimization. This kind of optimization (in c.vim etc) would probably mainly benefit huge codeblocks, of which I hope/imagine is the minority case, and after all, it's only an optimization like you said, so they can do away without it. Of course if we later want to declare our own syn syncs, they'd go after this line, as the minlines one currently does.

I pushed this fix to my codeblock-embeds branch. It seems like the codeblocks are perfect now!

I need to catch up on some other work I have to do that I've put off these days I've spent working on this, so I probably won't be able to help out much in the coming days, but I'll be around to communicate here for sure.

fmoralesc commented 10 years ago

Ok, then. I think, if your changes are working fine, you should merge them into master (or maybe push them to this repo's codeblock-embeds, I'll delete the old branch so there are no clashes.) I can tune it up if needed (make it configurable, that sort of stuff).

Thanks for all the work you've done in this, it's been great help.

blaenk commented 10 years ago

Alright, I squashed the commits to make it nice and pushed to codeblock-embeds branch.

Please don't forget about this little detail.

Also, like you said I think this should be off by default, which I think would mean just set it to an empty array in the plugin, then in vimrc people can specify whatever languages they want in the array (and in plugin don't extend/concatenate anymore, just do if exists then use the array specific in vimrc, else don't do anything). Then later we can perhaps look into a more automatic async/background task to do this. Or I've been thinking, at the very least, would it be possible to create a command? So something like :PandocHighlight haskell. Then one can turn highlighting on in a per-buffer basis instead of the global, loose long list to encompass every language one writes about.

fmoralesc commented 10 years ago

Fair enough. Taking over, have a nice weekend. ;)

The command idea seems promising, but I'll research the async method too.

blaenk commented 10 years ago

Cool thanks, sounds good.

fmoralesc commented 10 years ago

Ok, I made it configurable and pushed the new code to master. There's a small change though: the enabled languages are configured in a dictionary called g:pandoc_syntax_use_embeds_for_codeblocks_in_langs. The keys in this dict are the language names, as used by pandoc, and the values are the names of the syntax files used by vim. Sometimes they don't match. E.g:

let g:pandoc_use_embeds_in_codeblocks_for_langs = {
    \ "literatehaskell": "lhaskell" }

You can have a value for this and still disable the use of embedded highlighting in codeblocks by setting g:pandoc_syntax_use_embeds_for_codeblocksto 0.

I also created a pair of functions to enable and disable highlighting for specific languages.

blaenk commented 10 years ago

Yeah, they don't always match but I'd say most of the times they do, that's why vim-markdown adopted a list instead of a dictionary to make it more convenient for the user:

let g:langs = ["scala", "ruby", "literatehaskell=lhaskell"]

And they pull out the components this way (which handles whether or not an entry has an alternative name):

let s:lang_name = matchstr(s:lang, "^[^=]*")
let s:lang_syntax = matchstr(s:lang, "[^=]*$")

Just an idea in case you agree that's more useable.

By the way, the highlighting is erroring out on me. I'm using the example dictionary you provided:

let g:pandoc_use_embeds_in_codeblocks_for_langs = {
    \ "literatehaskell": "lhaskell" }

And it's producing this error as soon as I open it:

syntax

Finally, I wanted to clarify that earlier when I mentioned that it'd be nice to have a function to manually enable/disable languages, primarily I meant this for user use, so that we can manually do it as a command, :enablelanguage whatever, but it seems the functions you created aren't publicly accessible?

blaenk commented 10 years ago

Seems like function arguments must be prefixed with a:, so a:langname. Also notice in EnableEmbeds you pass the syntax file name but don't use it and instead use the dictionary directly, so change that occurrence to a:langsyntaxfile.

I pushed this fix to master as f1ec047aa41f0f60ce4daf9d46114ef543426615

Still would like a way to make these public though so users can use them, and also feedback on the list-instead-of-dict suggestion.

blaenk commented 10 years ago

I pushed a293a38ab2dd02bd73d2955fd855b691ab5843c4 to codeblock-embeds in this repo.

It fixes the functions (with appropriate variable prefixes), then I switched to the language list instead of a dictionary (so that you can try it out), and finally added commands :PandocHighlight and :PandocUnhighlight. Because of the change to the language list, :PandocHighlight can take a parameter of the same form if necessary, e.g. :PandocHighlight literatehaskell=lhaskell.

I tested preset list configuration (in vimrc) as well as real-time highlighting and unhighlighting; it's glorious.

It works very beautifully (love it). I hope you agree.

fmoralesc commented 10 years ago

Sorry for my sloppiness... :p

I like your changes (I gave some minor comments). thumbs up.

blaenk commented 10 years ago

I'll incorporate your change, just pushed completion support!! Vim has completion support for syntax names ;)

fmoralesc commented 10 years ago

Great!

blaenk commented 10 years ago

Alright I added -buffer and syntax name completion. All pushed to this repo's codeblock-embeds and ready to merge, unless you have any other ideas/objections!

fmoralesc commented 10 years ago

Alright. All merged in master. Closing.

blaenk commented 10 years ago

:+1:

vim-pandoc / vim-pandoc-syntax

codeblock syntax highlighting delegation #14