kreuzwerker / kreuzlaker

11 stars 2 forks source link

Research GDPR/BDSG compliant deletion/anonymization of user records #14

Open fabdy opened 1 year ago

fabdy commented 1 year ago

The German Privacy Act (Bundesdatenschutzgesetz – BDSG) and the General Data Protection Regulation (GDPR) provides rules for data processing of user data. Exemplary is §75 BDSG where user data has to be deleted, if it is no longer necessary for the purpose of the tasks. Alternatively, there are some laws where anonymization of user data is sufficient, meaning that information cannot trace back to specific persons.

While it is comparatively easy to delete records from transactional databases, it turns out to be a bit more complicated in a data lake setup. We have to research about the possible approaches, such as making use of tabular data formats (Apache Iceberg, Apache Hudi, Delta Lake or Lake Formation Governed Tables) enabling deletions/inserts/updates or making use of S3 Lifecycle Policies.

Tasks:

Research the possible approaches to delete user records in a Data Lake setup Discuss findings with the team