marshallward / vim-restructuredtext

Syntax file for reStructuredText on Vim.
26 stars 12 forks source link

code block without language name specified. #40

Closed idnsunset closed 4 years ago

idnsunset commented 5 years ago

code-block directive without language name specified and some inline markups in the directive content, see the picture below:

rst-vim-code-highlight

We usually use only a bare :: to mark up literal block once default highlight language is already set, but sometimes still need to use the directive explicitly to set some options. In the picture above, beginning from :caption: to the end of comment seem to get highlighted as expected but */ to the rest are rendered as emphasized.

Look at the rstCodeBlock group in syntax file:

syn region rstCodeBlock contained matchgroup=rstDirective
      \ start=+\%(sourcecode\|code\%(-block\)\?\)::\s\+.*\_s*\n\ze\z(\s\+\)+
      \ skip=+^$+
      \ end=+^\z1\@!+
      \ contains=@NoSpell
syn cluster rstDirectives add=rstCodeBlock

Here ::\s\+ does not consider the situation mentioned above that there is no character following :: at all, which makes .. code-block:: fail to match against this group.

Look at another syntax group named rstExDirective`:

let s:ReferenceName = '[[:alnum:]]\%([-_.:+]\?[[:alnum:]]\+\)*'

execute 'syn region rstExDirective contained matchgroup=rstDirective' .
      \ ' start=+' . s:ReferenceName . '::\_s+' .
      \ ' skip=+^$+' .
      \ ' end=+^\s\@!+ contains=@rstCruft,rstLiteralBlock'

This one is matched successfully and subsequently rstCruft and rstLiteralBlock are used respectively for further match, the former gets */ in /* comment */ matched (/* comment is not matched due to a space after *) and latter gets :: in .. code-block:: matched.

Suggest changing the value of start in rstCodeBlock to something like:

start=+\%(sourcecode\|code\%(-block\)\?\)::\%(\s\+.*\)*\n\+\ze\z(\s\+\)+
marshallward commented 5 years ago

Sorry for the silence, only now looking at this (wanted to sort out #39 first).

Agreed, I've tried your example and this is wildly broken. This certainly ought to work. I suppose I've never tested this one because a trailing :: is basically equivalent. Will have a look at your fix and try it out.

idnsunset commented 5 years ago

Thank you for your work, hopefully this could be fixed with a better approach. BTW, I've tried the fix about https://github.com/marshallward/vim-restructuredtext/issues/39 and add new comment to explain your question about the trailing long part. It will be appreciated if you have time to review issue #39 as I think a tiny improvement may worth doing, thanks in advance.

marshallward commented 5 years ago

I've started looking at this more, your analysis is correct in that the block is not being read as a rstCodeBlock, due to the lack of a blank line between the directive and the code. If we add any fields to the directive, then we fall back to rstExDirective.

Your suggestion would be an improvement, but I think it's also wrong for rstExDirective to not render this correctly. In fact, I even wonder why we are distinguishing between a raw code block and a generic directive, since the rules ought to just be the same?

I recently fixed a whole range of problems related to literals (::) which I was hoping to extend to the directives, but never got around to it. So this is a good opportunity to get all of these working as expected.

marshallward commented 5 years ago

I guess the main question (again, as you've surmised) is whether rstCruft ad rstLiteralBlock belong in directives. I would have guessed no, but presumably Niko did this for a reason. I'll try to see what the standard says.

In any case, if they ought to be supported, then your suggestion of tightening rstCodeBlock is the way to go here. I think the best thing to do is to just explicitly match to fields (e.g. :class: someClass), which I've gotten working for rstExDirectives and should be easy to move over.

marshallward commented 5 years ago

I've convinced myself that rstExDirective and rstCodeBlock should be under different rules. For one thing, a code block needs a blank line after the directive header, whereas a generic directive can usually omit this blank line. Additionally, generic directives do support things like rstEmphasis, so it's correct to leave in rstCruft for rstExDirective and omit it from rstCodeBlock.

I've pushed a change (e354480) which addresses this by cleaning up rstCodeBlock a bit, mostly fixing a syntax bug and explicitly supporting fields.

There's still problems here, but I think that it does at the least resolve your problem.

madjxatw commented 4 years ago

After reviewing the documentation of RST and sphinx, I think this issue report is a bit misleading. For RST code directive, the language argument can be omitted and no highlighting will be applied (the body text is simply rendered as literal block). However, the language argument for Sphinx code-block(alias sourcecode) is not allowed to be omitted. I'm currently using Sphinx 1.85, at least 1.85 version will indicate an error about omitting the language argument. Bug #45 is exactly caused by the https://github.com/marshallward/vim-restructuredtext/commit/e35448039f333b7b3616f28ac25dbc67c5ff5a73 commit.

If you run rst2html.py over the following RST text, you will find the generated HTMLs are almost identical except for a additional code class attribute value.

.. code::

   print('body')

produces

<pre class="code literal-block">
print('test')
</pre>
</div>
</body>
</html>
::

   print('body')

produces

<pre class="literal-block">
print('test')
</pre>
</div>
</body>
</html>

The default behaviors for code::, sourcecode:: and code-block:: are all correct. The only problem is that the languages supported in this syntax file are limited by g:rst_syntax_code_list. If you specify a language name that is not available in this list, then the subsequent indented text won't regarded as literal block anymore. The language lexers supported by pygments are way more than those listed in g:rst_syntax_code_list, the user is quite likely to use one outside of the list, e.g. using c but not cpp. I think what need to be fixed is: When a language lexer excluded in the g:rst_syntax_code_list, the following indented text still should be rendered as literal block rather than regular text.

marshallward commented 4 years ago

I agree with @madjxatw, we ought to be defaulting back to a literal block if the language does not appear in the render list. The e354480 fix may have been too targeted.

marshallward commented 4 years ago

I can't say how robust this fix is, but if we just permit ~a non-space~ one space-delimited word before the first endline, then things start to look better:

181c176
<       \ start=+\%(sourcecode\|code\%(-block\)\=\)::\s*\S*\n\%(\s*:.*:\s*.*\s*\n\)*\n\ze\z(\s\+\)+
---
>       \ start=+\%(sourcecode\|code\%(-block\)\=\)::\s*\n\%(\s*:.*:\s*.*\s*\n\)*\n\ze\z(\s\+\)+
madjxatw commented 4 years ago

Since code-block/sourcecode:: doesn't allow no-argument form, keeping the body content of code-block in regular text rather than literal block can help alert users to the missing argument.

marshallward commented 4 years ago

@madjxatw You are right, though I also think we are dealing with two problems alongside the original problem.

Although @idnsunset's original example is not necessarily valid Sphinx markup, the problems are present in code::. Hopefully the additional change above (with one slight modification) should fix these two original problems.

The third problem of a missing language in code-block is real, but I'd like to have a separate discussion about how to notify the user about the problem. Can I suggest that we take it as a separate issue?

madjxatw commented 4 years ago

Sure, it makes sense to separate them. I just suggest leaving code-block alone as its real problem is language fallback rather than the argument issue.

marshallward commented 4 years ago

Commit 81f85a406a2b2af9a928435f81a7f25484ec5d81 ought to resolve these issues. If so, then I will close this issue and re-open a new issue with respect to code-block and its requirement of a language.

madjxatw commented 4 years ago

It works, even specifying unsupported languages now falls back to literal block, way better than before, thx!

marshallward commented 4 years ago

I think this issue got resolved, closing it.