kokizzu / re2

Automatically exported from code.google.com/p/re2
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

any character match problem #64

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi there,
my question might be silly but I can't reproduce a perl regular expression in 
re2 matching any character between two delimiters.

In perl I'm doing
perl -e 
'$string="AC=2;AF=0.020;AN=100;DP=2575;Dels=0.00;HRun=0;MQ0=0;SB=-378.92;set=Int
ersection"; $string=~/SB=(.*);/g; print $1."\n";'
-378.92

So, I'm expecting to extract the float negative number -378.92

this is my C++ code

-----------------------
/*
 * infostring.cpp
 *
 *  Created on: May 29, 2012
 *      Author: Francesco
 */

#include <iostream>
#include <string>
#include <vector>
#include <re2/re2.h>
#include <re2/filtered_re2.h>
#include <stdio.h>

using namespace std;
using namespace re2;

int main()
{
    string infoline = "AC=2;AF=0.020;AN=100;DP=2575;Dels=0.00;HRun=0;MQ0=0;SB=-378.92;set=Intersection";

    RE2 pattern("SB=(.*);");
    float match;
    RE2::PartialMatch(infoline, pattern, &match);
    cout << "the match here has been = " << match << endl;

    return 0;

}

-----------------

the output here is:
"the match here has been = 0"

so it returns 0.
if I change the pattern to 
RE2 pattern("SB=(-\\d+.\\d+);");

then the output is:
"the match here has been = -378.92"

that is, what I expect.
problem is that I really need any character, as I don't know in advance if it 
will be a negative number or not, and it must include a dot as a float.

I'm on a Mac OS Lion, using Mac G++ compiler, and I don't have any compilation 
error.

Thanks very much for any help you might be able to provide.
cheers

Original issue reported on code.google.com by francesc...@gmail.com on 29 May 2012 at 1:43

GoogleCodeExporter commented 9 years ago
the problem seems to lie on the ";" delimiter.
the code 
string infoline = 
"AC=2;AF=0.020;AN=100;DP=2575;Dels=0.00;HRun=0;MQ0=0;SB=-378.92;set=Intersection
";

    RE2 pattern("DP=(.*);");
    string match;
    RE2::PartialMatch(infoline, pattern, &match);

takes everything from "DP=" which is unique in the string to the last of ";" 
which is not unique in the string.
resulting in the output

the match here has been = 
2575;Dels=0.00;HRun=0;MQ0=0;SB=-378.92;set=Intersection

I tried also with 
RE2 pattern("DP=(.*);");
    StringPiece input(infoline);
    string match;
    RE2::Consume(&input, pattern, &match);

but without success.
what is the best way to have the match up to the first ";" delimiter only?

Original comment by francesc...@gmail.com on 29 May 2012 at 2:38

GoogleCodeExporter commented 9 years ago
I was going to suggest the ; as the problem but in your first example SB= only 
has one ; after it, so that didn't seem like it could be the problem.

In any event, you should probably write "DP=([^;]*)" as the expression. That 
will match as much as possible after the = but stop at the first ; or at the 
end of the string, so it will handle the last field too. Also, if a field name 
could be a suffix of another field name (for example if you had XDP and DP) you 
might want to use (?:^|;)DP=([^;]*) to constrain the field name to begin at the 
beginning of the string or after a previous semicolon. 

Please let me know if this fixes your SB= problem too. Thanks.

Original comment by rsc@swtch.com on 29 May 2012 at 2:53

GoogleCodeExporter commented 9 years ago
yes, your solution solves both

RE2 pattern("SB=([^;]*)");
gives
the match here has been = -378.92

and 
RE2 pattern("AF=([^;]*)");
    float match;
gives 
the match here has been = 0.02

just to understand the syntax you suggested then
you take zero or more with * of anything except ";" by using the negation [^;] 
and that is sufficient to ask for the first match only, because (negation) it 
stops as it finds ";".
also, in this syntax you don't need to specify the anycharacter symbol "."

is that correct?
thanks very much for your help, very useful!

Francesco

Original comment by francesc...@gmail.com on 29 May 2012 at 3:09

GoogleCodeExporter commented 9 years ago
Yes, that's exactly right. The full set of supported syntax is described at 
code.google.com/p/re2/wiki/Syntax

Original comment by rsc@golang.org on 30 May 2012 at 12:38