marshallward / vim-restructuredtext

Syntax file for reStructuredText on Vim.
26 stars 12 forks source link

Leading non-breaking (A0) spaces for inline emphasis #22

Closed mcepl closed 7 years ago

mcepl commented 7 years ago

Just to make a note here about issue https://github.com/vim/vim/issues/2118

marshallward commented 7 years ago

Renaming this issue to reflect the problem.

marshallward commented 7 years ago

Issue is with the following sample text: 14_godric_hollow.rst.txt.

It appears that the characters v and o (looks like Czech?) can be followed by non-blocking whitespace (A0) in several cases. This is causing syntax highlighting to ignore inline highlights which expect a leading ASCII space (20), such as emphasis (*example*).

Docutils appears to handle this case normally, treating A0 and 20 equally, as well as preserving the A0s in HTML output. So the syntax file ought to treat both spaces equally.

marshallward commented 7 years ago

I've pushed a change which appears to support non-breaking whitespace. Can you give it a try @mcepl ?

Also, just for my own interest, was this text generated by Vim? I am surprised that it would use this whitespace character. Are there vim-specific settings for Czech which generate these?

mcepl commented 7 years ago

Oh, right, A0 might be a problem. Part of the Czech typography is that we really don't like single-letter prepositions to be last on the line. In vim itself I can use for display 1 in formatoptions (which I guess was sneaked in by some other Czech), but for the real solution I use program vlna (http://petr.olsak.net/ftp/olsak/vlna) as a filter which replaces a space character after one letter preposition with ~ (it was originally made for TeX, where that is a non-breakable space), but I use it with A0. docutils are perfectly happy with it (XeTeX with package xunicode which is default understands A0 as a non-breakable space).

Now, the question I have is whether VimL doesn't have (shouldn't have) some more sophisticated function for distinguishing whether the character is space or not. I guess you may know there is more than one Unicode space and some other languages (e.g., Python) have a way more sophisticated algorithm behind their str.isspace(), but it seems to me there are such functions even in glibc. What is behind \s in regular expressions?

marshallward commented 7 years ago

Thanks very much for the explanation. I agree with you that some sort of generalised whitespace support would be beneficial, and have even opened an issue with vim (https://github.com/vim/vim/issues/2129) to discuss it. Hopefully something will come of it.

In the meantime, does your issue appear to be solved for now? I expect there may be others, but this feels like a step in the right direction.

mcepl commented 7 years ago

Works perfectly, thank you. (BTW, adding "Fixes #22" to the commit message would close this ticket upon merging to master and made it obvious for anybody who reads the log afterwards what's the commit about).

marshallward commented 7 years ago

Thanks, closing this.

Generally I don't like to reference the github issue numbers, since there's no linking of github metadata to the repository. I try to stick with long-form commits that explain the issue. But I guess it's just a preference :).