Open schlessera opened 6 years ago
This is really good stuff.
One thing I've been talking about with another dev is anonymisation of data when taking a copy of a site. The use case for this is where I'm taking a copy of a client's site for local development work, and I don't want all their user data on my dev copy.
In this case being able to do wp db export --anonymize
to get a dump of the database but with usernames and email addresses replaced by fake names (perhaps using something like Faker, if it's quick enough) would be awesome. The flag could take a list of column names like search-replace
does for excluding columns.
I'm sure this needs more thought.
One thing I've been talking about with another dev is anonymisation of data when taking a copy of a site. The use case for this is where I'm taking a copy of a client's site for local development work, and I don't want all their user data on my dev copy.
Related: https://github.com/10up/wp-hammer
WordPress 4.9.6 introduces a couple of new anonymisation helper functions: wp_privacy_anonymize_data
and wp_privacy_anonymize_ip
, the results of which pass through the wp_privacy_anonymize_data
filter.
These privacy-related commands should of course make use of the new Core functionality.
One thing I've been talking about with another dev is anonymisation of data when taking a copy of a site. The use case for this is where I'm taking a copy of a client's site for local development work, and I don't want all their user data on my dev copy.
Related: https://github.com/10up/wp-hammer
@swissspidy I may be misunderstanding, but it seems to me that wp-hammer
applies the changes to the database, so would be run on the local development database rather than the production database. Which is nice, but seems like a just-too-late solution to me. It would be nice if the developer never needs to obtain the private information, and could just have an anonymized DB export to load into the local environment instead.
Is there any progress in this?
Just fiddling around with the wp cli and https://roots.io/plugins/sync-script/ would be a great step to having a fast and effective way to pull databases from production to development. But I am missing a layer of anonymization.
Would be amazing to have a way to wp db export --anonymized
or something.
This would likely require a short configuration.
wp-migrate-db-anonymization (https://github.com/deliciousbrains/wp-migrate-db-anonymization) is a great step to make that happen for their script.
Oh, wow, the mentioned https://github.com/nullvariable/wpcli-gdpr-sanitizer sound great - but this would need to happen on the production site and just to the exported sql, not the live database.
I agree with @Japh - the anonymization should happen before sensitive data ever touched my local setup/the staging site.
As soon as the sql is exported, the file has no knowledge about WP anymore. So core functions will not be able to do the trick, right?
Just triggering some SQL commands to the SQL file might be possible, but would just manage to do the base work - eg. reset passwords and email addresses.
Is there a way I can support a potential implementation of this feature?
Having an option like: wp db export --truncate-tables=wp_users,wp_usermeta
would already be a huge help (although you still need to create a new user afterwards)
In case it helps someone, this is how I got the same result (an export without user data but keeping the table structures), using bash and mysqldump.
#!/usr/bin/env bash
MYSQLDUMP_IGNORE_FLAGS="--ignore-table=$DB_NAME.wp_users --ignore-table=$DB_NAME.wp_usermeta"
MYSQLDUMP_NO_DATA_FLAGS="wp_users wp_usermeta"
{ mysqldump -h "$DB_HOST" -u "$DB_USER" -p'$DB_PASSWORD' "$DB_NAME" $MYSQLDUMP_IGNORE_FLAGS --no-tablespaces && mysqldump -h "$DB_HOST" -u "$DB_USER" -p'$DB_PASSWORD' --no-tablespaces --no-data "$DB_NAME" ${MYSQLDUMP_NO_DATA_FLAGS}; } > "$DB_EXPORT_SAVE_PATH"
It would be nice to just be able to use WP-CLI to do this.
@theodejager You can try also something like that
#!/usr/bin/env bash
MYSQLDUMP_IGNORE_FLAGS="--ignore-table=$DB_NAME.wp_users --ignore-table=$DB_NAME.wp_usermeta"
MYSQLDUMP_NO_DATA_FLAGS="wp_users wp_usermeta"
{ wp db export $MYSQLDUMP_IGNORE_FLAGS && wp db export --no-data ${MYSQLDUMP_NO_DATA_FLAGS}; } > "$DB_EXPORT_SAVE_PATH"
as wp db export
uses mysqldump
and accepts all valid mysqldump
parameters - see the documentation.
WordPress is facing big issues with the rapidly approaching General Data Protection Regulation that will take effect starting from May 25th 2018.
There's a lot we could do using WP-CLI commands to give website owners the tools to comply with some of the regulations.
Some preliminary thoughts:
wp user erase
could make sure that a user is deleted together with all of the privacy-related data that is attributed to this user. It could trigger awp_erase_user
hook to let plugins add their own data subsets to be erased.wp user anonymize
could render all data that belongs to a user into an anonymized form, like stripping part of the IP, replacing emails with a placeholder, etc... It could trigger awp_anonymize_user
hook to let plugins add their own data subsets to anonymize.wp user list-privacy-data
could generate a list (in several different formats, like CSV or JSON) of all the privacy-related information on a given user. It could trigger thewp_user_privacy_data
filter so that plugins can add whatever personally identifiable information they have on a user.wp <entity> anonymize
could be used for specific entities like a comment or a post type, to remove all personally identifiable information from that entity. It could trigger awp_anonymize_$entity
hook to let plugins add their own data subsets to anonymize.wb db search --type=ip|email
could be used to search the database for specific personal information.(The above is only a collection of my very first thoughts, lots to discuss here)