rheinwerk-verlag / pganonymize

A commandline tool for anonymizing PostgreSQL databases
http://pganonymize.readthedocs.io/
Other
41 stars 26 forks source link

#47: Adding support for localized faker provider #48

Closed hkage closed 1 year ago

hkage commented 1 year ago

This PR adds support for Faker's localized provider. This allows the usage of generator methods that will be localized for one or more locales or special generator methods, that are only available for localized providers, like VAT-IDs.

Due to the fact, that the Faker instance needs to be initialized with one or more locales and initializing the Faker instance on data row level would lead into a massive performance issue, the locale has to be set for the whole Faker context - that means the locale will be used for the whole anonymization process. Therefore it will be added as a separate option within the YAML schema, like this:

tables:
  - address:
      fields:
        - first_name:
            provider:
              name: faker.first_name
        - vat_id:
            provider:
              name: faker.vat_id
options:
  faker:
    locales:
      - de_DE
BuddhaOhneHals commented 1 year ago

The support for locales on field level can be supported without initializing the Faker instance on data row level. The documentation suggests that you can access each locale you previously provided like that: fake['de_DE'].name().

So it would be possible to support something like that:

tables:
  - address:
      fields:
        - first_name:
            provider:
              name: faker.en_US.first_name
        - vat_id:
            provider:
              name: faker.de_DE.vat_id

options:
  faker:
    locales:
      - de_DE
      - en_US

or

tables:
  - address:
      fields:
        - first_name:
            provider:
              name: faker.first_name
              locale: en_US
        - vat_id:
            provider:
              name: faker.vat_id

options:
  faker:
    locales:
      - de_DE
      - en_US

What do you think?

BuddhaOhneHals commented 1 year ago

I added the support for defining locales on field level and introduced a default_locale option.

Full example:

    tables:
      - user:
          primary_key: id
          fields:
            - name:
                provider:
                  # No locale entry at all, use configured default_locale "de_DE"
                  name: fake.name
            - city:
                provider:
                  # Use "en_US"
                  name: fake.city
                  locale: en_US
            - street:
                provider:
                  # Use "cs_CZ"
                  name: fake.street_address
                  locale: cs_CZ
            - zipcode:
                provider:
                  # Use empty locale to ignore default_locale and to randomly select locale
                  name: fake.postcode
                  locale:

    options:
      faker:
        locales:
          - de_DE
          - en_US
          - cs_CZ
        default_locale: de_DE
hkage commented 1 year ago

LGTM :+1: