rheinwerk-verlag / pganonymize

A commandline tool for anonymizing PostgreSQL databases
http://pganonymize.readthedocs.io/
Other
42 stars 26 forks source link

Anonymizing error if there is a JSONB column in a table #12

Closed koptelovav closed 3 years ago

koptelovav commented 4 years ago

I have a strange error:

pganonymizer.exceptions.BadDataFormat: invalid input syntax for type json
DETAIL:  Token "'" is invalid.
CONTEXT:  JSON data, line 1: {'...
COPY source, line 29, column ui_settings: "{'firstTime': True}"

YAML file:

tables:
  - accounts:
      fields:
        - name:
            provider:
              name: fake.name
        - email:
            provider:
              name: fake.email
        - phone:
            provider:
              name: fake.phone_number
        - title:
            provider:
              name: choice
              values:
                - "Mr"
                - "Mrs"
                - "Dr"
                - "Prof"
                - "Ms"

truncate:
  - django_session

ui_settings column values: {"firstTime": true, "licenseBannerHasBeenShown": true} {"firstTime": true} {}

What am I doing wrong?

hkage commented 4 years ago

Thank you for your feedback.

I actually haven't tested the anonymization on JSON based fields. I need to do some further investigations with a test setup first.

The error itself is thrown when the anonymizer writes the content of a table into a binary CSV stream and then copies the data into a temporary table, using psycopg2's cursor.copy_from method. Maybe the JSON syntax breaks the streamed data.

I will take a look at this.

hkage commented 3 years ago

Hi, I am sorry I didn't had the time to fix this yet. Mainly because we could not use the pganonymizer for our production databases yet. Currently our PostgreSQL versions don't support JSON columns natively, but I will try to get a closer look into that issue.

Otherwise if you have the time and idea to fix it I appreciate a contribution any time.