Closed BronsCharlson closed 8 years ago
Can you see '--anonymize' option in the output of mydumper --help
?
Wow - thx for the fast reply. Yes I can see it:
--skip-tz-utc --use-savepoints Use savepoints to reduce metadata locking issues, needs SUPER privilege --success-on-1146 Not increment error count and Warning instead of Critical in case of table doesn't exist --lock-all-tables Use LOCK TABLE for all, instead of FTWRL -U, --updated-since Use Update_time to dump only tables updated in the last U days --trx-consistency-only Transactional consistency only --anonymize Data anonymization configuration file -h, --host The host to connect to -u, --user Username with privileges to run the dump -p, --password User password -P, --port TCP/IP port to connect to -S, --socket UNIX domain socket file to use for connection --defaults-file read mysql connection options from this file -t, --threads Number of threads to use, default 4 -C, --compress-protocol Use compression on the MySQL connection -V, --version Show the program version and exit -v, --verbose Verbosity of output, 0 = silent, 1 = errors, 2 = warnings, 3 = info, default 2
Do you have a working exapmle of a simple yaml-config file? Just for randomizing names for example?
I played around with the paths and filenames of --anonymize and the wordlists inside the yaml file. If I change paths to non existing files I recieve an error (which is good). So probably I have an error in my yaml configuration. My database is named database1 and I want to randomize first_name from this table:
mysql> desc user_masterdata;
+----------------------------+------------------+------+-----+---------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+----------------------------+------------------+------+-----+---------------------+-----------------------------+
| id | int(10) | NO | PRI | NULL | auto_increment |
| user_id | int(10) | NO | UNI | 0 | |
| first_name | varchar(100) | NO | MUL | | |
| last_name | varchar(100) | NO | MUL | | |
hey @BronsCharlson
try the below
databasename:
mytable2:
truncate: yes
myTable:
edit:
- when: {column_name: '.+'}
set: {column_name: ''}
careful of the yaml spacing.
for the file randomization you need to have the file in the same dir a mydumper and separate general config from database config. example
anonymizer_settings:
randomize:
# wordlist_id: file_with_words_one_per_line.txt
firstname: firstnames.txt
databasename:
mytable2:
truncate: yes
myTable:
edit:
- when: {column_name: '.+'}
set: {column_name: ''}
mytable3:
randomize:
column_name:
type: 'firstname'
Thank you for your efforts. Still I don't have any success with randomizing (which I need). I stripped down anonymize.yaml to this:
`anonymizer_settings: randomize: firstname: firstnames.txt
db_name: user_masterdata: randomize: first_name: type: 'firstname' `
mydumper, anonymize.yaml and firstnames.txt are in the same directory, mydumper creates a sql-dump - but without any randomization of the values of column _firstname:
`root@testdb:~/mydumper-anon-master# ./mydumper --threads 1 --verbose 3 --database db_name --tables-list user_masterdata --outputdir /mnt/backup/testdb/ --anonymize anonymize.yaml -u root -p password * Message: Connected to a MySQL server * Message: Started dump at: 2016-07-21 11:50:30
* Message: Thread 1 connected using MySQL connection ID 6412
* Message: Non-InnoDB dump complete, unlocking tables
* Message: Thread 1 dumping data for db_name
.user_masterdata
* Message: Thread 1 dumping schema for db_name
.user_masterdata
* Message: Thread 1 shutting down
* Message: Finished dump at: 2016-07-21 11:50:34
`
What about general config from database config. example you mentioned? Do I need an additional config for the database? I couldn't find anything in the documentation.
@BronsCharlson, can you try type: datetime and see if that works? There might be bug with randomizer lists, it's a feature I haven't tested very well. Did you test if any feature works at all (like truncate)?
Now I changed anonymize.yaml to:
`anonymizer_settings: randomize: firstname: firstnames.txt
db_name: user_masterdata: truncate: yes `
But the sql-dump is created anyhow.
I also tried type: datetime:
`anonymizer_settings: randomize: firstname: firstnames.txt
db_name: user_masterdata: randomize: first_name: type: 'datetime' ` but this didn't anonymize anything (also using it on a datetime field didn't help)
I'm clueless...
try this out
anonymizer_settings:
randomize:
firstname: firstnames.txt
db_name:
user_masterdata:
randomize:
first_name:
type: 'firstname'
Yaml syndax is important
Ha! You motivated me to try out writing YAML without tabs - but just with single spaces to ident. Now it worked!!! I will try to improve my script and then I will post an update. Just for the record - afaik YAML spaces need to be consistent - but can be tabs or single spaces. Is that correct?
Why does YAML forbid tabs?
Tabs have been outlawed since they are treated differently by different editors and tools. And since indentation is so critical to proper interpretation of YAML, this issue is just too tricky to even attempt. Indeed Guido van Rossum of Python has acknowledged that allowing TABs in Python source is a headache for many people and that were he to design Python again, he would forbid them.
brave new world - thanks a lot!
Hi,
I'm trying to anonymize data while dumping. But the dump always has the original data without any modifications. Compiling looks good - I can't see any error:
`root@testdb:~/mydumper-anon-master# cmake . -DRUN_CPPCHECK Parse error in command line argument: -DRUN_CPPCHECK Should be: VAR:type=value CMake Error: No cmake script provided. CMake Error: Problem processing arguments. Aborting.
root@testdb:~/mydumper-anon-master# cmake . -DRUN_CPPCHECK=yes -- Using mysql-config: /usr/bin/mysql_config -- Found MySQL: /usr/include/mysql, /usr/lib/x86_64-linux-gnu/libmysqlclient_r.so;/usr/lib/x86_64-linux-gnu/libpthread.so;/usr/lib/x86_64-linux-gnu/libz.so;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libdl.so
CMake Warning at docs/CMakeLists.txt:9 (message): Unable to find Sphinx documentation generator
-- MYSQL_CONFIG = /usr/bin/mysql_config -- CMAKE_INSTALL_PREFIX = /usr/local -- BUILD_DOCS = ON -- WITH_BINLOG = OFF -- RUN_CPPCHECK = yes -- Change a values with: cmake -D=
-- Configuring done -- Generating done -- Build files have been written to: /root/mydumper-anon-master root@testdb:~/mydumper-anon-master# make Scanning dependencies of target mydumper [ 20%] Building C object CMakeFiles/mydumper.dir/mydumper.c.o [ 40%] Building C object CMakeFiles/mydumper.dir/server_detect.c.o [ 60%] Building C object CMakeFiles/mydumper.dir/g_unix_signal.c.o [ 80%] Building C object CMakeFiles/mydumper.dir/anonymizer.c.o Linking C executable mydumper [ 80%] Built target mydumper Scanning dependencies of target myloader [100%] Building C object CMakeFiles/myloader.dir/myloader.c.o Linking C executable myloader [100%] Built target myloader `
My YAML file looks like this:
`anonymizer_settings: randomize:
wordlist_id: file_with_words_one_per_line.txt
database1: user_masterdata: randomize: first_name: type: 'firstname'
I call mydumper like this:
`/root/mydumper-anon-master/mydumper --threads 1 --verbose 3 --database database1 --tables-list user_masterdata --outputdir /mnt/backup/testdb/ --anonymize /root/anonymize.yaml -u user -p password
root@testdb:~/mydumper-anon-master# /root/run_mydumper.sh * Message: Connected to a MySQL server * Message: Started dump at: 2016-07-20 12:14:44
* Message: Thread 1 connected using MySQL connection ID 4389 * Message: Non-InnoDB dump complete, unlocking tables * Message: Thread 1 dumping data for
database1
.user_masterdata
* Message: Thread 1 dumping schema fordatabase1
.user_masterdata
* Message: Thread 1 shutting down * Message: Finished dump at: 2016-07-20 12:14:47 `Any idea what I'm missing?