rwnx / pynonymizer

A universal tool for translating sensitive production database dumps into anonymized copies.
https://pypi.org/project/pynonymizer/
MIT License
102 stars 38 forks source link

Unable to complete execution of default process control #92

Closed antonysavio-sol closed 2 years ago

antonysavio-sol commented 2 years ago

Describe the bug The default process control fails at the anonymizing step with error that the db_schema.table does not exist . But if i run each step individually after creating the db manually ,ie, if i call pynonymizer.run

pynonymizer.run(input_path="core_db.sql", strategyfile_path="strat.yaml",         output_path="anonimize.sql",db_name='core_db',db_user="test",db_password="test",verbose=True,only_step='RESTORE_DB')

pynonymizer.run(input_path="core_db.sql", strategyfile_path="strat.yaml",         output_path="anonimize.sql",db_name='core_db',db_user="test",db_password="test",verbose=True,only_step='ANONYMIZE_DB')

,ie, seperately , there are no errors and the respective columns get anonymized

To Reproduce

  1. Create a python file with contents
    
    import pynonymizer

pynonymizer.run(input_path="core_db.sql", strategyfile_path="strat.yaml", output_path="anonimize.sql",db_name='core_db',db_user="test",db_password="test",verbose=True)

contents of `strat.yaml`

tables: user: columns: first_name: ( RAND() ) last_name: ( RAND() )


2. Execute the python file 

**Actual behavior**

mysql: [Warning] Using a password on the command line interface can be insecure. Restoring: 100%|██████████| 233k/233k [00:00<00:00, 658kB/s] ["UPDATE user SET first_name = (''),last_name = ('');"] ["UPDATE user SET first_name = (''),last_name = ('');"] Anonymizing user: 0%| | 0/1 [00:00<?, ?it/s] mysql: [Warning] Using a password on the command line interface can be insecure. ERROR 1146 (42S02) at line 1: Table 'core_db.user' doesn't exist Anonymizing user: 0%| | 0/1 [00:00<?, ?it/s] Traceback (most recent call last): File "/Users/test/Documents/DataProcessor/tools/pnonymizer/anonimize.py", line 3, in pynonymizer.run(input_path="core_db.sql", strategyfile_path="strat.yaml", File "/Users/test/Documents/DataProcessor/venv/lib/python3.9/site-packages/pynonymizer/pynonymize.py", line 147, in pynonymize db_provider.anonymize_database(strategy) File "/Users/test/Documents/DataProcessor/venv/lib/python3.9/site-packages/pynonymizer/database/mysql/init.py", line 159, in anonymize_database self.runner.db_execute(statements) File "/Users/test/Documents/DataProcessor/venv/lib/python3.9/site-packages/pynonymizer/database/mysql/execution.py", line 131, in db_execute self.mask_subprocess_error(error) File "/Users/test/Documents/DataProcessor/venv/lib/python3.9/site-packages/pynonymizer/database/mysql/execution.py", line 81, in __mask_subprocess_error raise error from None File "/Users/test/Documents/DataProcessor/venv/lib/python3.9/site-packages/pynonymizer/database/mysql/execution.py", line 124, in db_execute subprocess.check_output( File "/usr/local/Cellar/python@3.9/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 424, in check_output return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, File "/usr/local/Cellar/python@3.9/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 528, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['mysql', '-h', '127.0.0.1', '-P', '3306', '-u', 'test', '-p**']' returned non-zero exit status 1.



**Expected behavior**
All the steps are executed without any error

**Additional context**
Add any other context about the problem here.
antonysavio-sol commented 2 years ago

Resolved issue

rwnx commented 2 years ago

Hi, can i ask you what happened here? would it be of use to anyone else to know?

antonysavio-sol commented 2 years ago

Hi , when i ran the tool , i assumed that the default process control mentioned as

Restore from dumpfile to temporary database.
Anonymize temporary database with strategy.
Dump resulting data to file.
Drop temporary database.

would just work ,ie, it would restore ,anonymize , dump and than drop the db . But basically it appears to not restore since it throws the core_db.user table not found and tries to anonymize it and fails even though the command line progress shows that it tried to restore . The current workaround that i am thinking of is to only attempt to anonymize the data in the existing db ,ie, just run the ANONYMIZE_DB . I am hoping that will resolve my issue