rwnx / pynonymizer

A universal tool for translating sensitive production database dumps into anonymized copies.
https://pypi.org/project/pynonymizer/
MIT License
102 stars 38 forks source link

Setting to choose amount of Seed Data (more Fake Data) #113

Closed davidshiu3 closed 2 years ago

davidshiu3 commented 2 years ago

Is your feature request related to a problem? Please describe. Currently we have a table with 100k rows and the seed data set only has 1000 entries (or however much, something much less than 100k). This causes many things to have the same repeated data.

Describe the solution you'd like Addition of a setting to be able to choose how much fake data we get of each type as well as more seed data.

Describe alternatives you've considered

Additional context We use Pynonymizer to anonymize our production data. This causes many repeated names throughout the scrubbed data.

rwnx commented 2 years ago

Hi, The seed data set has 150 rows by default.

You can change the amount of data by using the --seed-rows option.

  --seed-rows SEED_ROWS
                        Specify a number of rows to populate the
                        fake data table used during
                        anonymization. [$PYNONYMIZER_SEED_ROWS]

I have some outstanding work to improve the documentation on this one - I think it's easily missed.

rwnx commented 2 years ago

Closing this issue because of inactivity. Please reply or open another issue if there's more to say 😇