humlab-sead / sead_browser_client

Online browser client for the SEAD database
2 stars 0 forks source link

Accounts/viewstates & GDPR #85

Closed johanvonboer closed 4 years ago

johanvonboer commented 4 years ago

GDPR includes the right to be forgotten. This means that we need to have functionality which deletes an account, along with the same data in the backups.

johanvonboer commented 4 years ago

Did some reading on this. It seems that if we don't store the user's email, but rather a hashed representation of that email, then it's no longer considered to be personally identifiable data and thus we don't have to delete this information from backups. This would work fine for us since we only use the email to identify the account, not for any other purpose.

johanvonboer commented 4 years ago

Actually it seems to be a grey-zone, but so is how to handle person data in backups. It seems that most organization's policy is to not try to delete this data from backups, since it's simply too hard, and they just let it expire in 3 months or so as the backups are overwritten.

As a reasonable-effort action regarding this, I intend to perform the hashing of the of email's, specify a limited retention of the backups (probably 3 months), and inform the user of these things in the GDPR notice on the site. This seems to be the level of effort (perhaps a little more) that other organizations are putting into the compliance of this.

johanvonboer commented 4 years ago

Assigned @visead in case you have any comments on this

visead commented 4 years ago

Seems reasonable approach. I'm a little concerned over potential scientific tracability issues over deleting backups though. Ideally I'd like to archive instances eternally (could do that on MAL's server) to ensure research can be reproduced.

johanvonboer commented 4 years ago

Yes that's a good point so let's not delete backups (will have to look over what the backup routines are right now - I think they are auto-deleted after a while). At least I have now implemented the hashing of the emails so just that in itself should put us in a much better position regarding GDPR. The problem is how you define personally identifiable data. Is a hash of an email personally identifiable data? Some would argue yes...

johanvonboer commented 4 years ago

Updated SEAD backups config to never remove old full backups. I think we've probably done enough regarding this for now, so closing this.