Is your feature request related to a problem? Please describe.
Currently if the db charset and collation was utf8 and utf8_general_ci , after anonymization and dumping of the file and restoration , the charset becomes latin1 and the collation becomes latin1_swedish_ci which i assume is based on the target database default configuration ?
But even if the created databases have charset:utf8mb4 and collation: utf8mb4_unicode_ci, the anonymized dumps will still be encoded with db charset and collation as utf8 and utf8_general_ci respectively so the dumps still need to be updated
Is there a way via parameters or *kwargs to set the charset and collation of a database to a specific value so that once the data from source database (production) is anonymized and dumps generated , the charset and collations can be made to match that of the target databases charset and collation ,ie,charset:utf8mb4 and collation: utf8mb4_unicode_ci
Describe the solution you'd like
Have pynonymizer support executing ALTER DATABASE , ALTER TABLE commands during DUMP_DB step if already not supported to update the charset and collation of database,table and column by passing parameters like charset and collation to the pynonymizer python module
Specific details can be found in the SO below
https://stackoverflow.com/questions/5906585/how-to-change-the-character-set-and-collation-throughout-a-database
Describe alternatives you've considered
Manually update the databases charset and collations using ALTER commands , but that is cumbersome process which can also be error prone when done manually for multiple dbs
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe. Currently if the db charset and collation was
utf8
andutf8_general_ci
, after anonymization and dumping of the file and restoration , the charset becomeslatin1
and the collation becomeslatin1_swedish_ci
which i assume is based on the target database default configuration ? But even if the created databases have charset:utf8mb4
and collation:utf8mb4_unicode_ci
, the anonymized dumps will still be encoded with db charset and collation asutf8
andutf8_general_ci
respectively so the dumps still need to be updatedIs there a way via parameters or *kwargs to set the
charset
andcollation
of a database to a specific value so that once the data from source database (production
) is anonymized and dumps generated , the charset and collations can be made to match that of the target databases charset and collation ,ie,charset:utf8mb4
and collation:utf8mb4_unicode_ci
Describe the solution you'd like Have pynonymizer support executing
ALTER DATABASE
,ALTER TABLE
commands duringDUMP_DB
step if already not supported to update the charset and collation of database,table and column by passing parameters likecharset
andcollation
to the pynonymizer python module Specific details can be found in the SO below https://stackoverflow.com/questions/5906585/how-to-change-the-character-set-and-collation-throughout-a-databaseDescribe alternatives you've considered Manually update the databases charset and collations using
ALTER
commands , but that is cumbersome process which can also be error prone when done manually for multiple dbs Additional context Add any other context or screenshots about the feature request here.