rwnx / pynonymizer

A universal tool for translating sensitive production database dumps into anonymized copies.
https://pypi.org/project/pynonymizer/
MIT License
102 stars 38 forks source link

Restore dump synchronization #98

Closed DocLM closed 2 years ago

DocLM commented 2 years ago

Hi @rwnx, I found out while restoring a PostgreSQL dump of ~1.5GB that pynonymizer does not wait for psql to exit before starting to anonymize data.

If the dump contains some ALTER TABLE statements that work on the same fields that should be anonymized a race condition can happen and the process slow down or stuck in a unpredictable manner.

Instead of closing STDIN while restoring the dump I've added some statements that close STDIN and wait for the command process to complete. I think that this issue can also arise on MySQL/MariaDB and I've also adapted the command runner for this DBMS.

rwnx commented 2 years ago

Hi, Thanks for your contribution! This looks really valuable.

Can you add a line to CHANGELOG.md under the unreleased section? It would help a lot. There's some more info on the CONTRIBUTING.md doc

DocLM commented 2 years ago

Of course, I've added a description in CHANGELOG.md

rwnx commented 2 years ago

Great! Let's get this merged! Thank you. I can't speak to the next release date but it should be over the next week or so. 💁‍♀️✨