nickdodd79 / AutoBogus

A C# library complementing the Bogus generator by adding auto creation and population capabilities.
MIT License
438 stars 50 forks source link

Autobogus does not respect calls to Randomizer.Seed or UseSeed #51

Open zejji opened 3 years ago

zejji commented 3 years ago

Unit and integration tests should generally be deterministic (see here; with the potential exception of cases where one is specifically trying to implement randomized tests as a form of exploratory testing).

The base Bogus library provides for this as described here, namely either by:

However, Autobogus does not appear to respect either of these methods of deterministically creating fake data. I have tried both of the approaches without success, as exemplified below:

// Author class
public class Author
{
    public int Id { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }

    // Navigation properties
    public ICollection<Book> Books { get; set; }
}

// Global seed configuration
Randomizer.Seed = new Random(8675309);

// Local seed configuration
const int seed = 1234;
var authorFaker = new AutoFaker<Author>()
    .Configure(builder => builder
        .WithSkip<Author>(a => a.Id)
        .WithSkip<Author>(a => a.Books))
    .UseSeed(seed);

// Check output using LINQPad
authorFaker.Generate().Dump();

In both cases, the data output differs on each run. If I use Bogus without Autobogus, I am able to create deterministic data without any issues.

Is it currently possible to generate deterministic data with Autobogus or, if not, is it possible to add this feature? Despite the library offering a lot of other useful features, currently we cannot use it because of a requirement to create deterministic data.

zejji commented 3 years ago

I note there is another open issue here which is also related to the question of deterministic data, but I think that issue is just one facet of the general issue described above.

nickdodd79 commented 3 years ago

Hey @zejji

I have updated the issue you reference in you comment. Currently the UseSeed is not hooked up to the underlying Faker used by the AutoFaker class. I will get a change in place to make that so and let you know when it has been released.

Nick.

zejji commented 3 years ago

@nickdodd79 - Thanks, that will be a massive help!

logiclrd commented 3 years ago

It is also possible to create your own Faker object and assign it as the FakerHub for an AutoFaker. This allows the same Faker to be shared by multiple different instances of AutoFaker. See the .WithFakerHub method on IAutoConfigBuilder.

zejji commented 3 years ago

@logiclrd - Thanks

Unless I've understood correctly, isn't WithFakerHub a global configuration setting? What if you need multiple different Faker configurations (e.g. with different rules) and still need to be able to call UseSeed on each Autofaker? Is that possible?

The ultimate aim is to have each of our tests be both deterministic and individually configurable.

logiclrd commented 3 years ago

Whether it's global or not depends on how you're obtaining AutoFaker and/or AutoFaker<T> instances. It's not inherently global.

The following test program illustrates one way that the same underlying Faker can be shared amongst AutoFaker instances created on-demand:

  class Program
  {
    static void Main(string[] args)
    {
      var orderRandomizer = new Randomizer();

      for (int i = 0; i < 50; i++)
      {
        var faker = new Faker() { Random = new Randomizer(12345) };

        var autoFaker1 = AutoFaker.Create(builder => builder.WithFakerHub(faker));
        var autoFaker2 = AutoFaker.Create(builder => builder.WithFakerHub(faker));

        var order = new bool[5];

        for (int j = 0; j < 5; j++)
        {
          order[j] = orderRandomizer.Bool();
          Console.Write(order[j] ? 2 : 1);
        }

        for (int j = 0; j < 5; j++)
        {
          Console.Write('\t');

          if (order[j] == false)
            Console.Write(autoFaker1.Generate<int>());
          else
            Console.Write(autoFaker2.Generate<int>());
        }

        Console.WriteLine();
      }
    }
  }

Typical output:

12211   -1960183044     729902338       338421052       1042495879      1321583812
12111   -1960183044     729902338       338421052       1042495879      1321583812
21222   -1960183044     729902338       338421052       1042495879      1321583812
...

The first number indicates, with each digit, which AutoFaker object the corresponding subsequent number will be generated with. As you can see, randomly switching back and forth between the AutoFaker objects, a consistent sequence of int values is produced. Each time through the for i loop, it creates a new Faker with a new Randomizer, reusing the same seed each time.

You can use this pattern to ensure that any/all instances of AutoFaker or AutoFaker<T> you're using are guaranteed to be going to a particular underlying Faker with a particular underlying Randomizer, and you can control that Randomizer's seed.

nickdodd79 commented 3 years ago

Hey @zejji

Following on from the example above from @logiclrd, I have not been able to replicate the issue you describe. The configured seed is proxied through to the underlying Faker instance and the generated values are deterministic. I copied your test above into the AutoBogus test suite (with the inclusion of some assertions) and it is passing.

// Local seed configuration
const int seed = 1234;
var authorFaker = new AutoFaker<Author>()
  .Configure(builder => builder
    .WithSkip<Author>(a => a.Id)
    .WithSkip<Author>(a => a.Books))
  .UseSeed(seed);

// Check output using LINQPad
var author = authorFaker.Generate();

author.FirstName.Should().Be("functionalities");
author.LastName.Should().Be("throughput");

I have just released v2.12.0 which included a package upgrade for Bogus, so may be there are some changes there that have fixed what you describe. Otherwise, as the seed is proxying to Bogus, it could an issue to raise with that project if you are still experiencing it.

Nick.

zejji commented 3 years ago

@logiclrd - Thanks for this explanation - I will do some experimentation!

@nickdodd79 - I'm still seeing the issue with 2.12.0. I've attached a self-contained LINQPad 6 example project to demonstrate this - note that the autoBogusBook's Author name changes on each run (although the Book title remains consistent).

Please note it uses .NET 5 Release Candidate 2 which needs to be enabled in LINQPad via Edit > Preferences > Query > Default Framework Version.

Unless I'm doing something wrong, the seed doesn't appear to be propagating to Bogus?

Example project here: BogusAndAutoBogusTest_Github_NET5RC2.zip

cdarrigo commented 2 years ago

I'm having a similar issue. I'm use AutoFaker in my XUnit tests. In the class constructor I'm setting the Randomizer Seed value

   public DomainUnitTest<T>(ITestOutputHelper output)
        {
            Output = output;
            Randomizer.Seed = new Random(420);
        }

and my test

      [Fact]
        public Task Can_Read_Write_Properties()
        {

            var item = AutoFaker.Generate<T>();
            return Verify(item);
        }

I'm using AutoFaker in conjunction with Verify. This test is a base class and it has multiple subclasses (one for each domain type I want to test).

When these tests run in parallel they will fail the first time. This is expected verify behavior, as I need to accept the faker generated data. I can manually iterate through each failed test, accepting the accepting the generated data and re-running the failed test. This time, the generated data matches the expected data (AutoFaker has returned deterministic data) and the test passes. However, if I visit a previously passing test, faker is generating non-deterministic data, even though the test uses the same hard-coded seed value. This non-deterministic data causes the test to fail because the generated data has changed from the previously accepted values.

cdarrigo commented 2 years ago

Update: I am able to reproduce this issue now using xUnit tests. Set the static seed value to a constant. Randomizer.Seed = new Random(420);

Create two tests that spin up a fake using the same seed.

When the tests run sequentially, the faker returns deterministic data. When the tests run in parallel, the faker returns non-deterministic data.

logiclrd commented 2 years ago

Am I misunderstanding -- by running the tests in parallel, aren't they pulling numbers concurrently from the same underlying source??