odeke-em / vim

Automatically exported from code.google.com/p/vim
0 stars 0 forks source link

Regex with a back-reference to a positive look-behind fails to match #334

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
With the text:

    AbbbAc

This regular expression should match "bbbAc":

    /\v(A)@<=b+\1c

But it matches nothing.

Other regular expressions that do work are

    /\v(A)@<=b+\1       " to capture "bbbA"
    /\v(A)@<=b+(\1c)@=  " to capture "bbb"
    /\v(A)b+(\1c)       " to capture "AbbAc"

What version of the product are you using? On what operating system?

Windows 8.1 64-bit
VIM - Vi IMproved 7.4 (2013 Aug 10, compiled Aug 10 2013 14:33:40)
MS-Windows 32-bit console version
See attached for more of the report.

Original issue reported on code.google.com by michae...@google.com on 20 Feb 2015 at 10:39

Attachments:

GoogleCodeExporter commented 9 years ago
This is documented; the second part matches first, so you need to define the 
group there. See :help /\@<=

I'm frankly surprised the three working examples you give actually work.

Original comment by fritzoph...@gmail.com on 23 Feb 2015 at 5:01

GoogleCodeExporter commented 9 years ago
Interesting. The documentation doesn't specify with which engine (or both) 
referencing a group from inside the preceding atom shouldn't work. And :h \#= 
makes it sound like the new engine supports only a subset of what the old 
engine supports, so maybe my 3 working examples illustrate the real bug here? 
At the very least the discrepancy is confusing to somebody new to vim (ie me).

Original comment by michae...@google.com on 23 Feb 2015 at 5:21

GoogleCodeExporter commented 9 years ago
@Ben, you probably refer to this:

>   In the old regexp engine the part of the pattern after "\@<=" and
>   "\@<!" are checked for a match first, thus things like "\1" don't work
>   to reference \(\) inside the preceding atom.  It does work the other
>   way around:

However, this bug is with the default / new NFA-based regexp engine, which 
doesn't have this odd quirk:

>   However, the new regexp engine works differently [...]

In fact, by swapping the capturing group and reference and switching to the old 
engine, this then works. So, a clear indication of an inconsistency and bug.

Original comment by sw...@ingo-karkat.de on 23 Feb 2015 at 7:46

GoogleCodeExporter commented 9 years ago
Oh, I guess my documentation was out of date. Disregard my #2, then.

This issue still repros in 7.4.638 which has the updated documentation.

Original comment by michae...@google.com on 23 Feb 2015 at 8:38

GoogleCodeExporter commented 9 years ago
@ingo, yes, that's the help text I was referring to. I had missed the "however, 
the new regexp engine works differently..." text as you suspected.

However, I *don't* need to switch engines to see the pattern match, when I swap 
the capture group and reference. I *can't* get the unswapped pattern to match 
regardless of the regexpengine setting. So does the new engine have this quirk 
after all, in some situations?

Original comment by fritzoph...@gmail.com on 23 Feb 2015 at 2:39

GoogleCodeExporter commented 9 years ago
@Ben, I see that as well (and don't understand why). Maybe the reduced example 
here is just bad; the original problem was more complex:

<div>Test div</div>More words
     ^^^^^^^^^^^^^^
This works works but leaves off the trailing >:

/\v%(\<(\w+)\>)@<=.*\<\/\1

So I'd expect this to work, but it captures nothing:

/\v%(\<(\w+)\>)@<=.*\<\/\1\>

Original comment by sw...@ingo-karkat.de on 23 Feb 2015 at 3:31