nixys / nxs-data-anonymizer

A tool for anonymizing PostgreSQL and MySQL databases' dump
Apache License 2.0
233 stars 10 forks source link

Linking cells #27

Closed borisershov closed 4 months ago

borisershov commented 5 months ago

Issue discussion

Resume:

I think the best solution for your case is add a new block link into column filter. This block stores a links with other columns across a tables in whole database. I.e. cells in specific columns that have a same values before will have equal values after anonymization.

The following config file will explain the description above (see the screenshot with an example database):

filters:
  public.users:
    columns:
      accountID:
        value: {{- uuidv4 -}}
        unique: true
        link:
          public.statistic:
          - userID
      username:
        value: "{{ if eq .Values.username \"admin\" }}{{ .Values.username }}{{ else }}user_{{ .Values.id }}{{ end }}"
      password:
        type: command
        value: /path/to/script.sh
        unique: true

In other words values generated for column accountID in the table public.users (with unique requirements) will be set into corresponding cells within the column userID in public.statistic table.

Example database:

Before

Table users:

accountID username password
bf88ba99-448c-46a5-9f51-df734ff26d28 admin ZjCX6wUxtXIMtip
8d0bf732-b5ce-4385-853c-4e6d082f5871 alice tuhjLkgwwetiwf8
aa2cedda-a941-4f8b-aa8a-4f3947c0458c bob AjRzvRp3DWo6VbA

Table statistic:

userID views likes
8d0bf732-b5ce-4385-853c-4e6d082f5871 152 5
aa2cedda-a941-4f8b-aa8a-4f3947c0458c 1000 321
bf88ba99-448c-46a5-9f51-df734ff26d28 500 25
After

Table users:

accountID username password
2ebba890-a4e6-4dfb-a4d2-ebc5b88d04a0 admin preset_admin_password
d287d708-f6c5-4d4f-af7e-1d1996cd4fba user_2 Pp4HY
01f19b52-2782-4d6c-9c82-55fe6d29853f user_3 vu5TW

Table statistic:

userID views likes
d287d708-f6c5-4d4f-af7e-1d1996cd4fba 152 5
01f19b52-2782-4d6c-9c82-55fe6d29853f 1000 321
2ebba890-a4e6-4dfb-a4d2-ebc5b88d04a0 500 25
borisershov commented 4 months ago

As a result of the release this issue, the config file to anonymize the example data (see above) is following:

link:
- rule:
    value: "{{- uuidv4 -}}"
    unique: true
  with:
    public.users:
    - accountID
    public.statistic:
    - userID

Pay attention to unique property: it makes sure a generated data will be unique across the all linked cells specified in config.

P.S. For every element within the link slice you may specify multiple tables and columns for each one.