moodmosaic / Fare

Port of Java dk.brics.automaton and xeger, mostly used for generating strings that match a specific regular expression.
http://www.brics.dk/automaton/
MIT License
183 stars 43 forks source link

It generates the same string being called in a sequence #26

Open vasily-kirichenko opened 6 years ago

vasily-kirichenko commented 6 years ago
for _ in 1..10 do
        (Xeger @"^\[\<([A-Z][a-zA-Z0-9]*)*\>\]$").Generate() |> printfn "%s"

run 1

[<>]
[<>]
[<>]
[<>]
[<>]
[<>]
[<>]
[<>]
[<>]
[<>]

run 2

[<V5N42Eu>]
[<>]
[<>]
[<>]
[<>]
[<>]
[<>]
[<>]
[<>]
[<>]

run 3

[<Fl>]
[<>]
[<>]
[<IFELv0We6427oTb4116>]
[<IFELv0We6427oTb4116>]
[<IFELv0We6427oTb4116>]
[<IFELv0We6427oTb4116>]
[<IFELv0We6427oTb4116>]
[<IFELv0We6427oTb4116>]
[<IFELv0We6427oTb4116>]
moodmosaic commented 6 years ago

This looks like a bug in Xeger.cs which has been ported from here. It's been a long-long time since I did all this, and I currently don't have Java installed on my local to compare with the Java version.

I'd be curious to see if the problem persists if one could port the new Xeger stuff from this repo instead.

moodmosaic commented 6 years ago

Also, I can't remember if Xeger guarantees that the generated strings should be unique, but from what I see in the source code, the issue might be here, and I'd start by comparing it with the this one.

(These are just notes for me, so I can remember where to start looking at. I'm not asking you to do the work―pull requests are always welcome, though.)

moodmosaic commented 6 years ago

For one, this should be the only available constructor for Xeger―this one should be deprecated as it always creates a new Random instance.

moodmosaic commented 6 years ago

Based on what discussed in #27, I'm not 100% sure there's much we can do to generate totally different results. At least now we can generate some more distinct elements.

I think it boils down to #28, and perhaps we can close this one, but I'll let you decide. 🥇

vasily-kirichenko commented 6 years ago

Thanks you, it's ok for now.

MagnusMikkelsen commented 4 years ago

This issue remains in .NET framework. The underlying cause for this is the behaviour of new Random(). In .Net framework it uses tick count - which remains the same in 1/10000 ms.

Because of this, if you rapidly do new Xeger() you might get the same underlying random sequence, and in turns the same random strings.

new Random() in .Net Core internally generates a unique seed, and therefore we don't see the issue in .Net Core.

I've created an issue in the Autofixture Repo: https://github.com/AutoFixture/AutoFixture/issues/1183

moodmosaic commented 4 years ago

Perhaps we can fix this in .NET Framework by specifying the seed value.

MagnusMikkelsen commented 4 years ago

Perhaps we can fix this in .NET Framework by specifying the seed value.

Maybe by shamelessly using the implementation in System.Random in .net core?

The implementation can be seen here: Random.cs

... and the PR with the improvement and considerations here: https://github.com/dotnet/coreclr/pull/2192

In short it uses an internal Random object to generate seed. The seed for this internal Random object comes from Interop.GetRandomBytes. In the original PR it was generated by generating a Guid.

Would this implementation be overkill? It even ensures that different appdomains do not get the same sequences.

moodmosaic commented 4 years ago

Thank you for investigating :+1: :100:

Random.cs is licensed under MIT license, so I think it should be OK to copy some parts from that file and use them here.

Options

→ One option would be to change new Random() with new Random(GenerateGlobalSeed()) where GenerateGlobalSeed is defined as

private static unsafe int GenerateGlobalSeed()
{
    int result;
    Interop.GetRandomBytes((byte*)&result, sizeof(int));
    return result;
}

(Note that GetRandomBytes uses the RNGCryptoServiceProvider internally.)

→ Another option would be to do it exactly as they're doing it, having new Random(GenerateSeed()) where GenerateSeed is defined here. This one looks like the thread-safe version.

(Even if making the seed creation thread-safe, I'm not entirely sure that all the other Xeger/Fare parts are thread-safe...)


Feel free to pick any option you want though. Happy to look into your PR :rocket:

MagnusMikkelsen commented 4 years ago

Feel free to pick any option you want though. Happy to look into your PR 🚀

Will do. I've started off with the failing test, and will push an implementation to my fork later.

moodmosaic commented 4 years ago

One improvement is to use the following (already released) overload: https://github.com/moodmosaic/Fare/blob/703c5e24b9b4e352663f0b4acf8ab832c75cffbc/Src/Fare/Xeger.cs#L42

It's then up to the caller to supply a Random instance, but even a default (shared) instance should be good enough for the test provided by @vasily-kirichenko, and also for the AutoFixture internals in https://github.com/AutoFixture/AutoFixture/issues/1183.