vim-pandoc / vim-pandoc-syntax

pandoc markdown syntax, to be installed alongside vim-pandoc
MIT License
425 stars 61 forks source link

Paragraphs in lists are mis-treated as code blocks #43

Closed FloGa closed 10 years ago

FloGa commented 10 years ago

In Markdown, list items can have multiple paragraphs, which have to be indented by four spaces or one tab. Appearently, that is not checked now, because such paragraphs appear as code blocks in Vim (in a standard GVim with a pink font color and without further syntax conversions). Code blocks within a list item must be indented once more, so with eight spaces or two tabs.

For example:

- this is a list item

    this is a paragraph within the item.

        echo "Actually these lines are a code block"
        echo "within a paragraph within the list item."

Expected result: Just the last two lines should appear as a code block.

Actual result: All indented lines appear as code blocks.

blaenk commented 10 years ago

Thanks for reporting this. I've observed this as well. It's actually a pretty long-standing issue I think, I just haven't given it much thought as to how it may be possible to disambiguate list paragraphs, which I do use a lot, from regular indented code. Personally I'd remove highlighting from "indented code" blocks, since I don't use them, but perhaps @fmoralesc has an idea on working around this.

FloGa commented 10 years ago

Since I sensed a great point of confusion here, I studied the Pandoc-Doc and did some tests and came up with something, that may or may not be an advantage for this problem. I have not much experience with Vim's syntax commands, so I'm merely sharing my thoughts.

Obviously it is not possible, to start an indented code block right after a list, since all indented paragraphs right after a list item are considered part of the item. So I am thinking if it is possible to set a variable when a list item is encountered, like "list_going_on", and to unset this variable, when afterwards a not indented paragraph is encountered. The code block syntax then can be altered to start on list_going_on && "8 spaces" OR !list_going_on && "4 spaces".

I'll try to get used to the syntax commands and maybe I can come up with something more usable than my chaotic thoughts.

Personally I totally agree with you, in that such indented code blocks should be removed completely in favor of fenced code blocks, but as long as pandoc supports them, it should be part of the Vim syntax.

blaenk commented 10 years ago

Unfortunately such a thing isn't possible as far as I'm aware. The syntax highlighting rules are just regular expressions, and the highlighting is applied depending on whether or not they match; there's no way to specify contextual behavior/state as the match is being performed as far as I know. An alternative would be to check the highlight group of the preceding lines, as we do in the folding behavior in vim-pantondoc, but again that's not possible in syntax highlighting as I'm aware.

Normally something like this would be mitigated by modifying the regex to add a negative or positive look lookbehind, to ensure that the match succeeds only if it isn't/is directly following another regex.

So in this case for example we would construct the indented codeblock regex so that it only matches if it doesn't immediately follow a list declaration, but then consider this:

* a list declaration

    this will correctly not be highlighted since it immediately follows the list declaration.

    this will be highlighted because it doesn't _directly_ follow the list declaration.

This gets complicated quickly, especially considering that list items can contain any other kinds of block items. Perhaps @fmoralesc or someone else with more vim regex experience has some ideas. Perhaps I'm over-complicating this.

Failing a working workaround to this, personally I'm just going to disable highlighting of indented codeblocks since I never use them. The option g:pandoc_syntax_fill_codeblocks exists and would be useful for this particular case, but it hasn't been modified to work for this case, so far it only works for delimited code blocks. Check #46 for a proposal to add an option that would allow selectively disabling highlighting of specific codeblock types, such as indented ones.

FloGa commented 10 years ago

Yeah, I'm really sorry. Right after I posted my thoughts I looked into syntax highlighting in Vim and was quite shocked, how "simple" this is done. You are not over-complicating it at all. If any, you are not complicating it enough.

Imagine a code block inside a list item inside a list item. The paragraphs of this "inner" list item would be indented 8 spaces and the code blocks 12. So actually we would need something like a dynamical nested highlighting, that would be smart enough to add the spaces up on each nested level (so "+4 spaces on each level").

I'm afraid we are reaching the limit of Vim's highlighting methods here ...

blaenk commented 10 years ago

No worries, it helps to express the ideas nonetheless. Indeed, handling this particular context-dependent part of markdown would be pretty difficult in vim. Vim's syntax support is rudimentary in this regard, so we work with what we have.

On Mar 1, 2014, at 3:31 AM, Florian Gamboeck notifications@github.com wrote:

Yeah, I'm really sorry. Right after I posted my thoughts I looked into syntax highlighting in Vim and was quite shocked, how "simple" this is done. You are not over-complicating it at all. If any, you are not complicating it enough.

Imagine a code block inside a list item inside a list item. The paragraphs of this "inner" list item would be indented 8 spaces and the code blocks

  1. So actually we would need something like a dynamical nested highlighting, that would be smart enough to add the spaces up on each nested level (so "+4 spaces on each level").

I'm afraid we are reaching the limit of Vim's highlighting methods here ...

— Reply to this email directly or view it on GitHubhttps://github.com/vim-pandoc/vim-pandoc-syntax/issues/43#issuecomment-36422509 .

FloGa commented 10 years ago

I just had an idea I want to share with you. Perhaps I get around to working on it later that day, but just in case anyone of you is free and willing to try it, can take my idea.

I think of a for loop, let's say from 0 to 5, where 0 is the outermost level and 5 the theoretically innermost level of nesting. In each of these levels we define regions like ordered and unordered lists, maybe definitions, blockquotes, and of course indented code blocks. In every level, the indentation is defined as (4 * i) spaces || i tabs, where i is the index from 0 to 5. We name those regions *_nested_i, where again i is replaced by the actual index.

Every region defined in that loop can contain regions of *_nested_(i+1), so the next inner level regions. The innermost regions should only contain Normal regions. Anyone who needs more than 5 levels of nesting lists and the like should maybe think about a re-structuring of their paper anyway. And even if, we can increase the levels at need if the operation is cheap enough. Hell yeah, let's nest 100s of lists and more!

That way we can almost dynamically create nested regions that can contain "more nested" regions. We will have to use execute to wrap the syntax commands in order to use variables within the commands.

What do you guys think? Does this sound manageable?

fmoralesc commented 10 years ago

Hello all!

I had been thinking of something like this, but never got around to implementing it. So here it goes:

We could separate toplevel document regions (paragraphs, lists, embeds, codeblocks), and within those load the rest of syntax rules, like we do with embedded languages. So we would have sth like

syn include @PANDOCBASE syntax/pandoc-base.vim
syn match region PandocParagraph start=/.../ end=/.../ contains=@PANDOCBASE

As long as PANDOCBASE also has rules for this sort of region, we can have a certain degree of recursivity. The real problem is to create the rules for these toplevel regions, which is, really, the same problem we have now (if we could know an indented paragraph is a list item continuation instead of a , we would have fixed this already).

FloGa commented 10 years ago

I think, it is necessary here to define a region for the whole list, and not just for the bullet. The region starts with the first bullet like now, and it ends before the first not indented line afterwards, skipping another bullets. And any way I look at it, I'm afraid we can't avoid defining multiple levels of indentation, like I suggested before. Otherwise we will probably never be able to disambiguate nested items correctly.

blaenk commented 10 years ago

Yeah, like @fmoralesc said, unfortunately it comes down to that single ambiguity, between a codeblock and a list paragraph. I think you're right that that seems like the only approach, that is, manually defining repeated indentation levels. I think this would be required, after all, to be able to change the indentation spaces required at each level, from 4 at the first to 8 at the second, and so on. Even with something like what @fmoralesc is thinking about I think would require such work, unless I'm mistaken.

Personally I'm of the opinion that this would be messy, hackish, and incomplete. Like you said, we'd define up to a certain number of levels, and I think if we don't cover all of the bases with this then we shouldn't bother, despite the fact that it seems like there's no other recourse. vim-markdown for example doesn't do this, though truth be told their syntax file is much simpler, and doesn't highlight indented codeblocks as far as I can tell, which is they don't have to worry about this edge case. Personally I would simply do the same and not highlight indented codeblocks either, but I think #46 is a decent compromise if that's too drastic.

fmoralesc commented 10 years ago

This isn't only about codeblocks, it also happens with lists and definitions. This is actually the reason we currently only highlight bullets if it makes sense to do so: trying to match lists as a region proved to be way too complicated, cumbersome and, worst of all, non performant.

blaenk commented 10 years ago

Yeah, that's definitely another thing to consider. Manually generating/defining an arbitrary depth of indentations for rules would definitely have an impact on performance. This is of particular note, I think, considering that we already do some pretty complex things for a highlighting plugin.

So I think #46 is a decent compromise, though personally I'd just remove highlighting of indented non-delimited codeblocks altogether by default, maybe have it opt-in if really necessary.

If it were realistic to have this work perfectly it'd be nice, but we work with what we have. Perhaps in the near future neovim can improve upon this area.

fmoralesc commented 10 years ago

On @blaenk's last point: yes, if we had async and a good event system we could do anything (literally, we could run pandoc to parse what's changed in the buffer and apply syntax highlighting on the fly).

fmoralesc commented 10 years ago

On the complexity of the stuff we already do: I'm kind of proud of our work on this. ;)

As it is: we could remove highlighting of indented non delimited codeblocks if that's a popular opinion, it shouldn't bee too much work at all (should be enough for now to change g:pandoc_syntax_fill_codeblocks default to 0.)

blaenk commented 10 years ago

Actually that option only applies to pandocDelimitedCodeBlock, and not pandocCodeBlock which is the one in question, though it makes sense that it should also apply to it, which is the primary reason behind #46. Continuing on that note however, I think it would indeed make sense to, by default, not highlight indented code blocks, i.e. set the option from #46 to ['indented'] by default at least.

My reasoning is that this way, users won't be surprised when they witness the behavior stated in this issue, i.e. list paragraphs being highlighted. If indented code blocks are highlighted out of the box, then they'll find themselves in this dilemma of having to choose between highlighted indented codeblocks or correct list paragraphs. If we don't highlight indented codeblocks from the start, then they'll never miss it to begin with.

Personally I feel somewhat strongly that we shouldn't highlight non-delimited indented codeblocks at all, not even making it a configurable option, for this very reason. What we can try, if you're open to this, is doing so --- i.e. removing this, and then if people object after the fact we can add it in as a configurable option. We already do highlight the other kinds of codeblocks anyways, I don't think there would be an expectation for indented non-delimited codeblocks to be highlighted, it helps that they do look visually different from other codeblocks in that they're not delimited; it serves as a visual cue. vim-markdown also doesn't highlight such codeblocks, even though they do also have support for embedded codeblocks (though not as nice as ours ;).

fmoralesc commented 10 years ago

OK, then, let's do that. I would favor having the option to enable it if needed, if only because I believe it is a sure thing someone will ask for it sometime, but other than that, I would not mind at all. I think we highlight codeblock embeds simply because the original version of vim-pandoc's syntax file had it, and it was kept around.

Can you take care of this, then? I'll be having a really weird schedule this month (got computer/internet access just today after a week).

blaenk commented 10 years ago

Yeah, I'll take care of it. I just wanted to make sure we were all on the same page before I started getting to it. It won't be a priority for me but I will be working on it.

fmoralesc commented 10 years ago

Thanks.

blaenk commented 10 years ago

Should be resolved by 97f5e95d6c88971b50f0233b8928761e5d2ccb4e.