rwnx / pynonymizer

A universal tool for translating sensitive production database dumps into anonymized copies.
https://pypi.org/project/pynonymizer/
MIT License
102 stars 38 forks source link

Is there a way to specify the charset and collation for anonymized dumps in pynonymizer python module ? #110

Closed armorKing11 closed 6 months ago

armorKing11 commented 2 years ago

Is your feature request related to a problem? Please describe. Currently if the db charset and collation was utf8 and utf8_general_ci , after anonymization and dumping of the file and restoration , the charset becomes latin1 and the collation becomes latin1_swedish_ci which i assume is based on the target database default configuration ? But even if the created databases have charset:utf8mb4 and collation: utf8mb4_unicode_ci, the anonymized dumps will still be encoded with db charset and collation as utf8 and utf8_general_ci respectively so the dumps still need to be updated

Is there a way via parameters or *kwargs to set the charset and collation of a database to a specific value so that once the data from source database (production) is anonymized and dumps generated , the charset and collations can be made to match that of the target databases charset and collation ,ie,charset:utf8mb4 and collation: utf8mb4_unicode_ci

Describe the solution you'd like Have pynonymizer support executing ALTER DATABASE , ALTER TABLE commands during DUMP_DB step if already not supported to update the charset and collation of database,table and column by passing parameters like charset and collation to the pynonymizer python module Specific details can be found in the SO below https://stackoverflow.com/questions/5906585/how-to-change-the-character-set-and-collation-throughout-a-database

Describe alternatives you've considered Manually update the databases charset and collations using ALTER commands , but that is cumbersome process which can also be error prone when done manually for multiple dbs Additional context Add any other context or screenshots about the feature request here.

rwnx commented 6 months ago

Closing as stale and out of scope. Please open a discussion if you want to talk more and gather community support. https://github.com/rwnx/pynonymizer/discussions