Open gianmarialari opened 5 years ago
Thank you for reporting this, @gianmarialari.
This look like a xeger
issue to me, and it'd be better if we could ping the maintainer(s) of the current upstream/java version. I believe, they can be found at https://github.com/bluezio/xeger.
I'd be really interested hearing any thoughts on this—perhaps this is something that has been already improved, and so we can just update the C#/.NET fork with what has changed over there.
Thank you for your answer!
I made some test also with the java library. This is the code I used:
import nl.flotsam.xeger.Xeger; class ExampleProgram { public static void main(String[] args){ String regex = "a{0,100}"; Xeger generator = new Xeger(regex); for (int i =0; i<30;++i){ String result = generator.generate(); System.out.println(result); } } }
But the longest string generated is always ~10 characters.
I will try to contact them at https://github.com/bluezio/xeger.
Thank you, Gianmaria
On Wed, 13 Nov 2019 at 15:29, Nikos Baxevanis notifications@github.com wrote:
Thank you for reporting this, @gianmarialari https://github.com/gianmarialari.
This look like a xeger issue to me, and it'd be better if we could ping the maintainer(s) of the current upstream/java version. I believe, they can be found at https://github.com/bluezio/xeger.
I'd be really interested hearing any thoughts on this—perhaps this is something that has been already improved, and so we can just update the C#/.NET fork with what has changed over there.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/moodmosaic/Fare/issues/54?email_source=notifications&email_token=AC6W5QWZOTGR5IXWNEAVZYTQTQFLJA5CNFSM4JLV5EP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED6KHPQ#issuecomment-553427902, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC6W5QT6EJOHJWL5BSLMYBLQTQFLJANCNFSM4JLV5EPQ .
Great, just saw it, https://github.com/bluezio/xeger/issues/3. Let's see what we get back.
Ciao Moodmosaic,
I post here a small program to transform a string like "a{n1 ,m1} bill{ n2, m2} carl{3}" to "a{r1} bill{r2} carl{3}" where r is a random number between {n,m}.
using System;
using System.Text.RegularExpressions;
namespace RegexQuantifier
{
class Program
{
static string ConvertQuantifier(string input)
//Convert a string containing any occurence of "{n,m}" in "{r}" with r=rnd(n,m);
{
string result = input;
foreach (Match match in Regex.Matches(input, pattern: $@"\{{\s*\d+\s*,\s*\d+\s*\}}"))
{
string quantifier = match.Groups[0].Value;
int min = int.Parse(Regex.Match(input: quantifier, pattern: $@"\d+").Value);
int max = int.Parse(Regex.Match(input: quantifier, pattern: $@"\d+").NextMatch().Value);
int r = new Random().Next(min, max + 1);
result = Regex.Replace(input: result, pattern: quantifier, replacement: "{" + r.ToString() + "}");
}
return result;
}
static void Main(string[] args)
{
string input = "a{10 ,20} bill{ 0, 20} carl{3}";
Console.WriteLine("Source string: " + input);
Console.WriteLine("Output string: " + ConvertQuantifier(input));
}
}
}
My program probably contains few errors, it's surely not efficient and it could surely be better written but I hope other can enjoy it.
Thank you Moodmosaic. G.
Thank you, @gianmarialari :+1:
@moodmosaic, here is a new version of the previous program.
The ConvertQuantifiers
function is written in a more modular way, and hopefully a bit clearer. More important it fixes a bug. Unfortunately I'm not a regex expert so I'm not able to say if it works with all the regex string, but if I understood correctly regex quantifiers syntax, it should :)
I hope others will found it useful.
using System;
using System.Text.RegularExpressions;
namespace RegexQuantifier
{
class Program
{
static string ConvertQuantifiers(string input)
//Convert a string containing one or more occurences of {n,m} in {r} with r=rnd(n,m)
{
string EscapeQuantifiers(string inputQ) => inputQ.Replace($@"{{", $@"\{{").Replace($@"}}", $@"\}}");
string TransformMinMaxToR(string inputMM) //Transfom {n,m} to {r} with r=rnd(n,m)
{
int min = int.Parse(Regex.Match(input: inputMM, pattern: $@"\d+").Value);
int max = int.Parse(Regex.Match(input: inputMM, pattern: $@"\d+").NextMatch().Value);
int r = new Random().Next(min, max + 1);
return "{" + r.ToString() + "}";
}
string result = input;
foreach (Match match in Regex.Matches(input, pattern: $@"\{{\s*\d+\s*,\s*\d+\s*\}}"))
{
string minMax = match.Groups[0].Value;
string r = TransformMinMaxToR(minMax);
string minMaxExcaped = EscapeQuantifiers(minMax);
result = Regex.Replace(input: result, pattern: minMaxExcaped, replacement: r);
}
return result;
}
static void Main(string[] args)
{
Console.WriteLine("Given a regex pattern it replaces each quantifiers {n,m} to {r} with r=rnd(n,m)");
Console.WriteLine("Example:");
string input = "a{10 ,20} bill{ 0, 20} carl{3} (a[bc]{3,40})?xyz|ghi{0,10}.*hello";
Console.WriteLine("Input : " + input);
Console.WriteLine("Output: " + ConvertQuantifiers(input));
}
}
}
That's great! Perhaps we can add some examples in the library!
If you think I can help please let me know, I will be glad to help. Ciao, g.
If I write a regex expression like
a{0,100}
I expect thatXeger.Generate()
generates one of the possible matching sequences ("a", "aa","aaa","aaa" ...... "aaaaaaaaaaaaa[....]aaaaaaaaaaaaaa").But
Xeger.Generate()
almost never generate sequence longer than 15. On stackoverflow Issue generating multiple occurrence with Fare/Xeger they told meSo, If I understood correctly this means that the probability to get long string is tremendously low.
Is there any simple way to make
a{0,100}
really generate sequences between 0 and 100 characters long? (I mean, with similar frequency:)). Thank you, g.