makinacorpus / DbToolsBundle

A Symfony bundle to backup, restore and anonymize your data
https://dbtoolsbundle.readthedocs.io
MIT License
179 stars 12 forks source link

Anonymize a single entity #136

Open SebastienTainon opened 5 months ago

SebastienTainon commented 5 months ago

Hello,

Following our discussion on https://github.com/superbrave/gdpr-bundle/issues/209 I'm creating this issue to suggest a new behaviour for this bundle: anonymize a single entity.

Here's an example use-case: a user removes his account from the website. But his account is deeply linked to other users accounts, so you can't just remove him completely from the database. Instead you have to anonymize him to remove all his personal data, and mark him deleted in the database.

Currently this use-case is not covered by the bundle as it is only used to anonymize a whole table. But it would be nice to have an API to anonymize a single row (entity).

Thanks!

pounard commented 4 months ago

We discussed this topic a bit further orally with @SimonMellerin and @Lonnytunes, there are a few caveats doing that.

First you need to understand that the anonymization process is not intended to be used on a production runtime: the whole goal is to run on a production data copy outside of the production. Hence it was designed with this in mind: there are no caches of configuration or known anonymizers, which means that every time you run it, the code will parse your configuration, validate it, lookup in vendors for anonymizer packs, etc... This is very inefficient code for production runtime use.

It's by design, we definitely optimized the SQL queries in order to run very fast on a big databases, but we didn't optimized it to run very fast on a small filtered dataset.

Moreover, we didn't make it API friendly for this kind of use case. We intend to improve the API, for sure, in next versions because we have many plans around it, but your specific use case is not (at least not for now) in the roadmap.

If your goal is to anonymize a single database entry, and if that very specific anonymization process is part of your business specification, your application domain, also a recurring need, then you probably should implement it as such, by your own means, as you would do for any other business function.

We keep this in mind, and may find a way to do this, but right now as the bundle exists, I must warn you: attempting to run the anonymization procedure on the real production would be a very dangerous thing to do; any mixed-up parameter or method call and you can tell good bye to your production data! We are not going to support this kind of usage soon.

Nevertheless, as stated in https://github.com/makinacorpus/DbToolsBundle/issues/138 it seems that the need of being able to anonymize a filtered seems to be a recurring need. So we are going to investigate and try to find an elegant, usable solution for this. But be warned that until we do find a way, implement it, test it, and say that it's not experimental anymore, any custom attempt to do it would be dangerous for your data.

SebastienTainon commented 4 months ago

Thank you @pounard for the detailed explanation and for having discussed about the implications of this issue! I understand that it's a use case that is a bit far from the initial goal of this bundle. I am not intending to develop a custom behaviour on top of the bundle (yet) :smiley:

pounard commented 4 months ago

I am not intending to develop a custom behaviour on top of the bundle (yet) 😃

Whenever you do, please file an issue to discuss, we won't accept everything but any clever idea will be thouroughly evualated for inclusion !