rwnx / pynonymizer

A universal tool for translating sensitive production database dumps into anonymized copies.
https://pypi.org/project/pynonymizer/
MIT License
101 stars 38 forks source link
anonymization anonymized-data anonymized-database database dbanonymizer gdpr hacktoberfest python

pynonymizer pynonymizer on PyPI Downloads License

pynonymizer

pynonymizer is a tool for anonymizing sensitive production database dumps, allowing you to create realistic test datasets while maintaining GDPR/Data Protection compliance. It replaces personally identifiable information (PII) in your database with random, yet realistic data, using the Faker library and other functions.

Key features:

With pynonymizer, you can safely share production database copies with developers and testers, enabling better staging environments, integration tests, and database migration simulations, without compromising user privacy.

How does it work?

pynonymizer replaces personally identifiable data in your database with realistic pseudorandom data, from the Faker library or from other functions. There are a wide variety of data types available which should suit the column in question, for example:

Pynonymizer's main data replacement mechanism fake_update is a random selection from a small pool of data (--seed-rows controls the available Faker data). This process is chosen for compatibility and speed of operation, but does not guarantee uniqueness. This may or may not suit your exact use-case. For a full list of data generation strategies, see the docs on strategyfiles

Examples

You can see strategyfile examples for existing databases, in the the examples folder.

Process outline

  1. Restore from dumpfile to temporary database.
  2. Anonymize temporary database with strategy.
  3. Dump resulting data to file.
  4. Drop temporary database.

If this workflow doesnt work for you, see process control to see if it can be adjusted to suit your needs.

mysql

mssql

postgres

Getting Started

Usage

CLI

  1. Write a strategyfile for your database
  2. Check out the help for a description of options pynonymizer --help
  3. Start Anonymizing!

Docker

Docker Image Version

pynonymizer is available as a docker image so that you dont have to install the client tools for your database.

See https://hub.docker.com/repository/docker/rwnxt/pynonymizer

# As pynonymizer depends on strategyfiles, you'll need to create a file mount so the file can be read.
docker run --mount type=bind,source=./strategyfile.yml,target=/tmp/strategyfile.yml rwnxt/pynonymizer -s /tmp/strategyfile.yml --db-host [...]

Package

Pynonymizer can also be invoked programmatically / from other python code. See the module entrypoint pynonymizer or pynonymizer/pynonymize.py

import pynonymizer

pynonymizer.run(input_path="./backup.sql", strategyfile_path="./strategy.yml" [...] )