rwnx / pynonymizer

A universal tool for translating sensitive production database dumps into anonymized copies.
https://pypi.org/project/pynonymizer/
MIT License
102 stars 38 forks source link

MySQL: column strategy "empty" does not working with json columns #112

Closed r2omvavra closed 2 years ago

r2omvavra commented 2 years ago

Describe the bug The empty strategy fails on MySQL JSON columns with the error ERROR 3140 (22032) at line 1: Invalid JSON text: "The document is empty." at position 0 in value for column 'tbl_printJob.pj_content'.

The problems seems to be that pynonymizer tries to add an empty string (''), but the JSON column requires at least some valid JSON.

To Reproduce

The query that is run is: "UPDATE tbl_table SET json = ('');"

Expected behavior I was expecting an empty minimal JSON ({}) or a more descriptive error message, as it took me a while to figure out what happened there.

Additional context I'm currently using the work around to set the literal myself, but maybe there is a good way for the tool to handle it automatically or at least to update the documentation/error message where it makes sense.

I also want to say thank you for the great tool!

rwnx commented 2 years ago

Hi:

pynonymizer has very little knowledge of the database's schema - setting the literal is the expected solution here, in my mind. JSON columns not supported (except by literal: ) at the current time as they'd require a lot of knowledge about the provider's implementation. Pynonymizer fits into a sweet spot where it "just runs sql" to anonymize the database. I'm reluctant to make pynonymizer do this.

I was thinking of removing the 'empty' option, because in situations like this it doesn't make any sense. What it does is set empty string '' and hopes that coercion works (i.e. does setting a INT field to '' make sense? )

It also has different behaviour in different providers (postgres / mssql)

This could definitely be clearer in the documentation, it's an easy footgun - sorry i don't have a better answer for you here.

r2omvavra commented 2 years ago

It totally makes sense, thank you for the taking the time.