me-box / databox

Databox container manager and dashboard server
MIT License
94 stars 25 forks source link

support for delete and/or redact of values in store API(s) #198

Open cgreenhalgh opened 6 years ago

cgreenhalgh commented 6 years ago

Currently the store APIs (esp. timeseries) don't have a delete or redact operation.

But, for example, if we cache tweets in a store then the EULA (and future GDPR) requires that we delete/redact the user content of a tweet in a timely manner if they delete the tweet in twitter. Similar principles apply to other drivers which cache data from external services (e.g. facebook, email) and the principle (GDPR, user control/right to be forgotten) also applies to any of our own drivers/services/apps which take content from users.

In some cases at least - like twitter - it might be better to preserve a stub of the record without the user content (i.e. a record that there was a message that was deleted) rather than delete it entirely. This will presumably need to be relatively efficient, and enforced within the store.

mor1 commented 6 years ago

Does the GDPR actually require this? Given that we aren't storing anything at all -- the user (data subject) is responsible, surely?

cgreenhalgh commented 6 years ago

Yes, maybe you can sidestep GDPR in some deployment cases (the default?!).

But the user can't sidestep the EULA requirements of twitter etc, e.g. if they are cacheing other user's tweets, so the technical facility should still be present IMO.

And if we have research deployments (i.e. a research project deploying databoxes to a group of participants to support a specific research activity) then I think this is also an important facility to have.

Toshbrown commented 6 years ago

@jptmoore is looking into deleting/redacting data for the new store

mor1 commented 6 years ago

@cgreenhalgh I don't think it's "sidestepping". IANAL but I don't see how the GDPR is relevant here. Or we'd all have to register as data controllers for storing our own bank statements and suchlike nonsense surely.

What sort of EULA requirements are you thinking of here specifically?

cgreenhalgh commented 6 years ago

e.g. twitter developer terms

If Content is deleted, gains protected status, or is otherwise suspended, withheld, modified, or removed from the Twitter Service (including removal of location information), you will make all reasonable efforts to delete or modify such Content (as applicable) as soon as reasonably possible, and in any case within 24 hours after a request to do so by Twitter or by a Twitter user with regard to their Content, unless otherwise prohibited by applicable law or regulation, and with the express written permission of Twitter.

mor1 commented 6 years ago

Reading those terms, they all seem to me to be framed as if providing a service consuming the Twitter API, not providing an application that the user runs against their own account. Wouldn't Twitter have to issue a request to do so to every Databox operator to invoke that? (Not to the author of the driver.)

cgreenhalgh commented 6 years ago

My recollection is that the developer guidelines say that you can cache tweets, but you should revalidate the information in the cache, i.e. check if it has been deleted, before presentation to the user if you cache it for longer than that timespan. So the onus is on the user of the API. So logically I think a twitter driver should do something like that automatically (whichever legal entity is responsible for running it).

haddadi commented 6 years ago

It's basically our job to do it! "you will make all reasonable efforts to delete ...." if we have an app that collects and caches tweet...

but, in reality no one does it.. come GDPR, thigns might change

Twitter does not have an informing service API for now AFAIK

cheers

== Hamed https://haddadi.github.io

On 18 April 2018 at 17:54, Richard Mortier notifications@github.com wrote:

Reading those terms, they all seem to me to be framed as if providing a service consuming the Twitter API, not providing an application that the user runs against their own account. Wouldn't Twitter have to issue a request to do so to every Databox operator to invoke that? (Not to the author of the driver.)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/me-box/databox/issues/198#issuecomment-382455805, or mute the thread https://github.com/notifications/unsubscribe-auth/ACy0rxKN3XHjmbGjFTWFEcDLpvnVV3PMks5tp2_dgaJpZM4Qh_5O .