rwnx / pynonymizer

A universal tool for translating sensitive production database dumps into anonymized copies.
https://pypi.org/project/pynonymizer/
MIT License
101 stars 38 forks source link

Mysql - RESTORE_DB not working with remote host. #127

Closed Dylan-Duault closed 1 year ago

Dylan-Duault commented 1 year ago

Describe the bug RESTORE_DB can't connect to remote mysql server with Mysql DB_TYPE. Pynonymizer tries to connect to a localhost mysql server instead of the one given with DB_HOST.

To Reproduce

pynonymizer.run(
            db_host=db_host, # This is a remote host
            db_port=db_port,
            db_user=db_user,
            db_password=db_password,
            input_path=f"./{export_file_name}",
            strategyfile_path="./utils/hidden-name_strategy.yml",
            output_path=f"./{anonymized_file_name}",
            seed_rows=1
        )

Pynonymizer can connect to the Mysql server, create the database, but fails at the RESTORE_DB step :

Restoring: 0%| | 0.00/942M [00:00<?, ?B/s]ERROR 1044 (42000) at line 34: Access denied for user 'hidden-user'@'hidden-ip' to database 'hidden-name_strategy_dbd750d76cc048a991f3c7fa549412dc'

Additional context On the Readme file, we can read the following warning for the mssql database type:

For RESTORE_DB/DUMP_DB operations, the database server must be running locally with pynonymizer. This is because MSSQL RESTORE and BACKUP instructions are received by the database, so piping a local backup to a remote server is not possible.

It seems the same warning should be issued for the Mysql database type as well, as it doesn't seem to work.

rwnx commented 1 year ago

Hi, can you help me understand how you have arrived at this conclusion:

Pynonymizer tries to connect to a localhost mysql server instead of the one given with DB_HOST

The error message seems to indicate that it is attempting to connect to the server but failing. you redacted an ip from the message, hidden-ip - is that the correct (remote) IP?

Could this be a permissions issue? Restoring a dump file often requires a lot of grants. what permissions does this user have on the remote server?

Dylan-Duault commented 1 year ago

Hello !

Let's say DB_HOST is 111.111.111.111 (remote mysql server), and the dedicated server IP from which I run the script is 222.222.222.222.

When I run the script, pynonymizer connects to 111.111.111.111:3306 and creates the database successfully.

At the beginning of RESTORE_DB, the script fails with an error like this : Restoring: 0%| | 0.00/942M [00:00<?, ?B/s]ERROR 1044 (42000) at line 34: Access denied for user 'hidden-user'@'222.222.222.222' to database 'hidden-name_strategy_dbd750d76cc048a991f3c7fa549412dc'

Whats seems weird to me is that it doesn't try to connect to localhost or 127.0.0.1, it tries to connect to the server IP instead of the database IP.

Here are the database user grants :

Thank you for your help, sorry if it's not a pynonymizer issue !

Dylan-Duault commented 1 year ago

I have reproduced the same Mysql user on my local environnement, and get a similar error. It may indeed be caused by permissions issues, I'll share what grants the user is missing when I'll find out.

rwnx commented 1 year ago

I think the error message is just confusing, and it sure confused me. the message is from the server, identifying you! from the server's perspective, you are 'hidden-user'@'222.222.222.222', because 222.222.222.222 is your external IP.

the backup dump files often contain statements for restoring other schema. you might need: ALTER, TRIGGER,

If it's a data-only dump you should only need the ones you mentioned.

Here's a DBA post about this that i think might help: https://dba.stackexchange.com/questions/119673/what-are-the-minimum-persmission-required-to-restore-mysql-dump

That will be specific to how your backup was created - and you should be able to work out which grants you need.

Dylan-Duault commented 1 year ago

You are right, the missing grants were "LOCK TABLE" and "ALTER".

from the server's perspective, you are 'hidden-user'@'222.222.222.222', because 222.222.222.222 is your external IP.

Exactly, that was confusing !

Thank you very much for your help and your amazing package.