moodmosaic / Fare

Port of Java dk.brics.automaton and xeger, mostly used for generating strings that match a specific regular expression.
http://www.brics.dk/automaton/
MIT License
182 stars 43 forks source link

Attempt to guarantee a uniform distribution #27

Closed moodmosaic closed 6 years ago

moodmosaic commented 6 years ago

Related to https://github.com/moodmosaic/Fare/issues/26. Based on https://github.com/bluezio/xeger/blob/3098732bc920194a5bf5378310d24dc32f1e6cf2/src/main/java/nl/flotsam/xeger/Xeger.java#L106-L110


Microsoft (R) F# Interactive version 4.1
Copyright (c) Microsoft Corporation. All Rights Reserved.

For help type #help;;

> #r "Z:\\Mukatsuku\\Notes\\Fare\\Src\\Fare\\bin\\Debug\\Fare.dll";;
--> Referenced 'Z:\Mukatsuku\Notes\Fare\Src\Fare\bin\Debug\Fare.dll'

> let rnd = System.Random ();;
val rnd : System.Random

> open Fare;;
> let xeger = Xeger (@"^\[\<([A-Z][a-zA-Z0-9]*)*\>\]$", rnd);;
val xeger : Xeger

> for _ in 1..10 do
        xeger.Generate () |> printfn "%s";;
[<D6>]
[<W8p>]
[<Jy>]
[<>]
[<I4z8YK5>]
[<O2DX1L>]
[<Y>]
[<>]
[<>]
[<Nh3Uqp>]
val it : unit = ()

> for _ in 1..10 do
        xeger.Generate () |> printfn "%s";;
[<>]
[<FlN>]
[<W>]
[<>]
[<D6>]
[<ZzLtlv>]
[<>]
[<Q36Tct89>]
[<W4>]
[<I>]
val it : unit = ()

> for _ in 1..10 do
        xeger.Generate () |> printfn "%s";;
[<>]
[<>]
[<>]
[<>]
[<>]
[<J3G>]
[<PU03pRiOZ>]
[<GP>]
[<PD39K>]
[<>]
val it : unit = ()

> for _ in 1..10 do
        xeger.Generate () |> printfn "%s";;
[<>]
[<J5LQ>]
[<IKE>]
[<>]
[<>]
[<>]
[<D9>]
[<ESR9N>]
[<N>]
[<>]
val it : unit = ()

> 
moodmosaic commented 6 years ago

/cc @vasily-kirichenko

I'm not that excited by the results, but they look better than what we currently have(?)

vasily-kirichenko commented 6 years ago

athanks, but it still generated same results ([<>]) in 5 of 10, or even 6 of 10 cases...

moodmosaic commented 6 years ago

That's true. (That's also the reason I didn't merge the pull request.)

The difference here is that we're trying to increase the distinct elements count; contrary to #26 where the distinct elements were always about 2 to 3; in this pull request it's 5 to 8 (less [<>]).

Xeger doesn't have the properties of a true PRNG (it's not SplitMix, for example)―if there's a way to fix this inside Xeger, I can't remember, but it'd take me a while to find the time to investigate further. 😢

vasily-kirichenko commented 6 years ago

@moodmosaic I see. Thanks, I think this PR is an improvement anyway.

moodmosaic commented 6 years ago

I think so, too. However, I opened https://github.com/moodmosaic/Fare/issues/28 because I still think we can do better if we rebase Xeger.cs to the latest one from Java.

About to merge this and release a new NuGet package. @vasily-kirichenko, thanks for the feedback 💯

moodmosaic commented 6 years ago

@vasily-kirichenko, I just pushed version 1.0.4 on NuGet.


Pushing Fare.1.0.4.nupkg to 'https://www.nuget.org/api/v2/package'...
  PUT https://www.nuget.org/api/v2/package/
  Created https://www.nuget.org/api/v2/package/ 13189ms
Your package was pushed.