lamuguo / re2

Automatically exported from code.google.com/p/re2
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

syntax page is misleading about [:digit:] vs [[:digit:]] #116

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

This simple code (also sent in attachment):

#include <re2/set.h>
#include <stdio.h>
#include <stdlib.h>
#include <vector>

int test (const char *patt, const char *text)
{
        re2::RE2::Options       opts;
        re2::RE2::Set           t (opts, re2::RE2::UNANCHORED);
        std::string             error;
        if (t.Add (patt, &error) < 0)
        {
                printf ("'%s' is a bad pattern\n", patt);
                return 0;
        }
        if (!t.Compile ())
        {
                printf ("can't compile '%s'\n", patt);
                return 0;
        }
        std::vector<int>        dummy;
        bool    r = t.Match (text, &dummy);
        printf ("'%s' %s '%s'\n", text, (r ? "matches" : "doesn't match"), patt);
        return 0;
}

int main ()
{
        test ("[:alpha:]", "a123");
        test ("[:digit:]", "a123");
        test ("[0-9]", "a123");
        return 0;
}

produces:

'a123' matches '[:alpha:]'
'a123' doesn't match '[:digit:]'
'a123' matches '[0-9]'

Oddly enough, 'a123' doesn't match '[:digit:]', while it matches '[0-9]'.

I'm trying to convert Lua patterns to RE2, so while I can still use [0-9] in 
many cases, converting Lua's '[_%a%d]' to '[_[:alpha:][:digit:]]' gets quite 
more complicated...

Am I doing something wrong here?

Cheers,
Ben

Original issue reported on code.google.com by benens...@gmail.com on 27 Jun 2014 at 7:28

Attachments:

GoogleCodeExporter commented 9 years ago
FYI, I've tested with the re2-20140304.tgz, not the latest.

Original comment by benens...@gmail.com on 27 Jun 2014 at 7:33

GoogleCodeExporter commented 9 years ago
Did you try plain RE2::PartialMatch? I don't think the syntax is correct. I 
think it is [[:digit:]] not [:digit:]. The latter is just [:digt].

Original comment by rsc@golang.org on 27 Jun 2014 at 8:22

GoogleCodeExporter commented 9 years ago
I haven't tried RE2::PartialMatch, my use of RE2 mainly the ability to match 
multiple expressions at the same time with RE2::re2::Set.

From the http://code.google.com/p/re2/wiki/Syntax page, I guess I'm a bit 
confused, if I should use [:digit:] or [[:digit:]] as a character class. I 
tried [:digt] as you mentioned, without much success though. Also tried 
[:digit] (assuming a typo), without success. Any help there would be much 
appreciated.

I worked around this by using Perl and Unicode character classes \\pL, \\d and 
\\w which seem to work fine on my test cases.

Cheers,
Ben

Original comment by benens...@gmail.com on 2 Jul 2014 at 4:38

GoogleCodeExporter commented 9 years ago
Erratum:

Please read \pL, \d and \w in my previous comment.

Ben

Original comment by benens...@gmail.com on 2 Jul 2014 at 4:40

GoogleCodeExporter commented 9 years ago

Original comment by rsc@golang.org on 9 Jul 2014 at 4:18

GoogleCodeExporter commented 9 years ago
This issue was closed by revision 67e0bfcd78e9.

Original comment by rsc@swtch.com on 6 Oct 2014 at 6:56