rwnx / pynonymizer

A universal tool for translating sensitive production database dumps into anonymized copies.
https://pypi.org/project/pynonymizer/
MIT License
102 stars 38 forks source link

Pynonymizer overwrites my original database from dump #114

Closed sjklawy closed 2 years ago

sjklawy commented 2 years ago

Pynonymizer overwrites my original database from dump, it shouldn't work like that, right? If I do a test from dump on only one table everything works. So I decided to anonymize the entire base, and unfortunately it didn't work.

First i do a base dump (whole base) and run pynonymizer

mysqldump --user=xxx -p --databases institution -R > institution.sql
pynonymizer -i institution.sql -s institution.yml -o output.sql

and there is an error: ERROR 1146 (42S02) at line 1: Table 'institution_e18f24daf8e342ca8d553748246b93c5.institutionUser' doesn't exist

So I decided to use process control and see what's going on. pynonymizer --stop-at RESTORE_DB --db-name test -i institution.sql -s institution.yml -o output.sql

[CREATE_DB]
[RESTORE_DB]
Restoring: 100%|███████████████████████████████████████████| 117k/117k [00:00<00:00, 6.10MB/s]
Skipped [ANONYMIZE_DB]: (Stopped at [RESTORE_DB])
Skipped [DUMP_DB]: (Stopped at [RESTORE_DB])
Skipped [DROP_DB]: (Stopped at [RESTORE_DB])
Process complete!

It turns out that it is ok, but there are no tables in the 'test' database and they should appear. So I did another test. I swapped a value in the original institution database and started pynonymizer. It turned out that the previously changed value in my database was restored from the dump.

rwnx commented 2 years ago

Hi! This has very little to do with pynonymizer and everything to do with how the dump is created - we run the dumped sql with the mysql command with the database name we expect to be used, but if your database dump contains references to a specific database name, the data will be restored there

By default, mysqldump contains CREATE DATABASE and USE statements for the database name. If you want to be able to change the database name, you need to exclude these options from the dump, e.g. with the --no-create-db option

more reading on the --databases option: https://dev.mysql.com/doc/refman/8.0/en/mysqldump.html#option_mysqldump_databases

This could probably be clearer in the documentation. If you have any suggestions for that, I'd really appreciate it!