(See https://code.google.com/r/mike-vim-extended-regex/ for the source code of
this feature. Diffs are here:
https://code.google.com/r/mike-vim-extended-regex/source/list )
I've implemented support for extended regular expressions in Vim, somewhat
similar to Perl's extended regex feature, which allows you to make complicated
regexes (especially in Vimscript files) easier to read, by including whitespace
and comments in them. (Vim already allows multiline regexes.) I'm hoping
that, after any changes suggested on this forum, this will be a useful addition
to Vim.
One of the trickiest, and perhaps most contentious, parts is choosing the
syntax to use -- how to turn on extended mode, and what comments should look
like. I am very open to feedback and changes on this. Below, I present the
reasoning behind my initial choices.
This is what I have implemented:
- To turn on extended mode, put \# at the beginning of your regex.
- A comment is enclosed in double-braces, like {{ this }}.
- To match a space rather than having it be ignored, use "\ ".
Here is a simple example. syntax/c.vim includes this, for syntax highlighting
of backslash-escaped sequences inside strings in C:
" String and Character constants
" Highlight special characters (those which have a backslash) differently
syn match cSpecial display contained "\\\(x\x\+\|\o\{1,3}\|.\|$\)"
With extended regular expressions, the above could be written with whitespace
and comments:
" String and Character constants
" Highlight special characters (those which have a backslash) differently
syn match cSpecial display contained
\ "\#
\ \\ {{ literal backslash, followed by one of... }}
\ \(
\ x \x\+ {{ hex, e.g. '\x2c' }}
\ \|
\ \o\{1,3} {{ octal, e.g. '\755' }}
\ \|
\ . {{ e.g. '\n' or '\t' etc. }}
\ \|
\ $ {{ end of line }}
\ \)"
I have not yet written tests or docs. If you want, I would be happy to do so.
As for the syntax: Obviously it is best not to invent a brand new syntax unless
there is a good reason to do so. I would have preferred to use Perl's syntax,
which is:
- To turn on extended mode, use "x" in the flags area after the regex, e.g.
/foo/x
- A comment begins with (?# and ends with )
Unfortunately, neither one of those worked out especially well in Vim. For
turning on extended mode, Vim makes only very light use of "flags" after
regular expressions. In fact, although it allows a few flags after the :s
(substitute) command, in general it doesn't use flags after regular
expressions. In Vim, usually the same effect is achieved by putting special
codes at the beginning of a regex, such as \c to ignore case.
And for comments, Using (?# ... ) would work, but would be somewhat awkward.
In Perl, both the () operator and the ? operator are "magic" by default (do not
need to be escaped with a backslash to give them special meaning). But in Vim,
the opposite is true: By default, () just matches parentheses, and ? just
matches a question mark. So in a Vim regex, a comment would look like \(\?#
this \), which is just too ugly and too tricky for people to remember.
So I played around with a number of alternative syntax options.
-----
1. Syntax for turning on extended mode:
Consistent with other regex syntax in Vim, it seemed to me that the best way to
let the user turn on extended mode would be the presence of some special
sequence at the beginning of the regex, similar to Vim's current use of \c or
\C for case sensitivity, \m \M \v \V to choose a "magic" mode, and so on. Here
is a list of all available one-character backslash sequences:
\! \" \# \$ \' \, \- \: \; \g \j \q \y \^ \`
I would have liked to use \x or \e to indicate extended mode, but both of those
are already used. (\x means any hex digit; \e means the escape key.)
Given those choices, my favorite was:
\#
... mainly because "#" is used in many programming languages to begin a comment.
Other possibilities: Vim already uses \% and \z as prefixes for a number of
other commands, so two options that seem pretty good to me are:
\%e
or
\zx
I sort of like \%e. It has the advantage of being somewhat mnemonic (e for
extended), and also it avoids using up a punctuation character (#) that might
be better saved for other future enhancements.
-----
2. Syntax for comments:
One issue is: Should turning comments on/off require "magic" characters or not?
At first I thought, of course it would have to include magic characters; but
then it occurred to me that we could just use a character sequence that is
somewhat unlikely to appear in regexes, and that is easy to represent as
regular characters (rather than comment delimiters) in a regex if necessary.
I like {{ double braces }} because:
- They look nice and are easy to type.
- They don't conflict with any other regex syntax patterns. Yes, braces are
used to indicate a count, e.g. x{1,3} for one to three x's, but that uses
single braces.
- It is easy to represent a match for the actual characters "{{" in an extended
regex: Just put a space between them, "{ {".
Other options:
If we use \# to turn on extended mode, I thought it might be nice to use some
sort of comment delimiter that includes the "#" character, but I couldn't come
up with anything that good. The best I could come up with is ## to begin a
comment and ## again to end a comment, but that could lead to trouble if the
user tries to mark off a comment with "#############". Other possibilities:
#( )
{# #}
We can't use "#" by itself for comments, with end-of-line indicating the end of
the comment, because of the way Vim multiline strings work. In Vim, when you
write
let x = "this is
\ a string"
What you get is, "this is a string". There is no embedded newline in the
result.
I also thought it might be nice to somehow use the " double-quote character to
indicate comments, since that is Vimscript's comment character; but the
double-quote character would be a bad choice because often, in Vimscript, the
regex itself is double-quoted, so you would have to backslash-escape all the
embedded double-quote characters, which would get a bit messy.
-----
A few more details about the syntax:
- Comments support nesting. This is mainly useful while debugging your regex,
to "comment out" part of it.
- Comments and extra whitespace are not allowed in places such as inside
collections such as [a-z], repetition indicators such as {1,3}, in the middle
of special sequences such as "\%$", and so on.
Original issue reported on code.google.com by m...@morearty.com on 12 Dec 2012 at 12:55
Original issue reported on code.google.com by
m...@morearty.com
on 12 Dec 2012 at 12:55