meteor / meteor-feature-requests

A tracker for Meteor issues that are requests for new functionality, not bugs.
Other
89 stars 3 forks source link

Helpers for GDPR compliance #246

Open StorytellerCZ opened 6 years ago

StorytellerCZ commented 6 years ago

In May a new data/privacy protection law comes into effect in the EU. General Data Protection Regulation (GDPR) is a huge regulation which applies to anyone in the EU, but also to anyone who collects and processes data from EU residence. So if you have EU users this affects you.

Why should you really care? The sanctions are destructive, especially for small businesses (businesses of all sizes are included in this, no exceptions).

Some sources: https://www.eugdpr.org/ https://melv1n.com/gdpr-guide-product-managers/ https://en.wikipedia.org/wiki/General_Data_Protection_Regulation https://ico.org.uk/media/for-organisations/documents/1624219/preparing-for-the-gdpr-12-steps.pdf https://techblog.bozho.net/gdpr-practical-guide-developers/

This does not affect Meteor directly, but I think it would be nice if Meteor had functionality in its accounts packages that would help with compliance with GDPR since pretty much anyone who does not block EU users will have to comply with it.

I'm still learning about GDPR, but so far I think the following should be investigated (yes, most of it can be done in the app, but since all apps are going to be needing it soon I think it should be in the framework):

There are many other things, but those are application specific and probably should be handled by the app or some packages. I'm specifically thinking about:

This is what I have so far. Thoughts?

smeijer commented 6 years ago

Delete user functionality (Right of Erasure)

Whereby the personal data of the user should be unset in the users collection, but not the general account data.

If a user decides to be undeleted, I don't want information like ${user.username} wrote: to become undefined wrote (nor anonymous wrote).

How does this work? We are developing a corporate platform to share information within the organization. It's very important for us that log messages like user x uploaded file y and user x wrote ..., or user x approved action y are not lost in time.

Do you know how this works on the internet? It's also impossible to ask your employer to remove all your data after you decided to switch jobs.

I can imagine a situation where accounts can be deactivated, but I'm not in favor of removal, because of the situations written above.

StorytellerCZ commented 6 years ago

@smeijer That is one of my concerns as well. In our app we are currently leaning towards a solution where the user details/info in regards to documents we need to keep will be switched to a dummy account or something similar. Hence that way we fulfill the GDPR requirement while keeping the data we need to keep out app logic. This though requires that we change our TOS to allow to keep the anomized data. That is where a onDeleteUserHook comes in.

mitar commented 6 years ago

Whereby the personal data of the user should be unset in the users collection, but not the general account data.

One can request from you to delete all data associated with their user. Yes, this means that you have to make your logic support that. It is not simple.

smeijer commented 6 years ago

One can request from you to delete all data associated with their user.

But what does that mean? Let's have a user named "John Doe" with username "j.do". He has uploaded 3 documents, so the system created a message somewhere:

const user = { username: 'j.do', profile: { firstName: 'John', lastName: 'Doe' } }; // db.users.findOne({ ....
const log = { createdBy: 'j.do', message: '${user} uploaded ${count} files: ', fileIds: [''] }; // db.logs.find();
const files = { uploadedBy: '', name: '', blob: '' }; // db.files.find();

console.log(render({ log, user, files }));

John Doe uploaded 3 files:

  • img-1.jpg
  • img-2.jpg
  • doc-1.docx

Does "removal" mean that John Doe becomes jdo or do we even need to rename it to anonymous132. And in the last case, is a increment number allowed? So we can still group the messages by the "anonymous" user, or should we make it truly anonymous, and transfer messages to "user anonymous"?

Are the messages itself even still allowed? Or do we need to drop all user generated content? In this case, remove the log statements? And also remove all uploaded binary files?

In the example above, should all collections be cleared? All the remove statements? Or can we use some anonymizing updates?

// remove profile, now we'll see 'j.do uploaded x'
db.users.update({ username: 'j.do' }, { $unset: { profile: true, emails: true } });

  // OR rename to anonymous, we'll see 'anonymous123 uploaded x'
  db.users.update({ username: 'j.do' }, { $set: { username: `anonymous${inc}` } });

  // OR remove user? @#$%
  db.users.remove({ username: 'j.do' });

// also move log statements?
db.logs.update({ createdBy: 'j.do' }, { $set: { createdBy: 'anonymous' } });

  // OR remove them?
  db.logs.remove({ createdBy: 'j.do' });

// and what about uploaded documents?
db.files.update({ uploadedBy: 'j.do' }, { $set: { uploadedBy: 'anonymous' } }); 

  // OR remove them
  db.files.remove({ uploadedBy: 'j.do' });

When we should truly delete information; a lot of corporate platforms are really screwed. Imagine you'll quit your job after 5 years, and request your employer to delete all information you generated.

Let's for example say; you request github to delete all your information from a repo. All comments, but also commit logs, and thereby generated code?

mitar commented 6 years ago

And also remove all uploaded binary files?

You should definitely remove all uploaded binary files.

But I am not a lawyer, I am not an expert in this area, and I think this is probably not anymore related to this issue.

Imagine you'll quit your job after 5 years, and request your employer to delete all information you generated.

I do not think this applies here.

dr-dimitru commented 6 years ago

What about "depersonalization" or "depersonification". Not sure how to call it, but this action should result in - "Data cannot be associated with a particular person, nor person can be identified by data."

Removing (or replacing with some random string) all data which can be used to identify a person (names, nicknames, location, photos, numbers, payment info, etc.) - can do a thing.

StorytellerCZ commented 6 years ago

@dr-dimitru Probably should be enough (again, need a lawyer to confirm this), but that is out of the scope of the issue. The point of the issue is what should/can be added to Meteor to make all the other stuff around GDPR a bit easier (ie. a common way to delete user data that Meteor uses and then allow a hook to allow the app to do its own deletion/anonymization stuff). The Right of Erasure is the most contentious as it can break most of the stuff, but there are other things that can be done as well. This is even more concerning since Meteor has a UI package for accounts.

dr-dimitru commented 6 years ago

+1 for next:

smeijer commented 6 years ago

onRemoveUser hook - Full user's Object (e.g. MongoDB record) should be passed into this hook, so we can manipulate user's data before it will be erased, like depersonalization

Perhaps prevent removal when the hook returns false ? So we can change it to a update statement inside the hook. Or even use update by default, but than by default simply keep the _id property stored and drop everything else.

Than the developer has the choice to drop the whole user, or turn it into a 'anonymous' user so user._id references are still intact.

paulincai commented 6 years ago

Just some thought on this.

I worked with corporates up until 3 years back. In SOX standards for instance, you need to maintain a Trail Log and Financials Audit Trail. Basically you need to know what any user has done in the financial systems...forever. A report of user access was printed out and assessed by the Financial Controller quarterly. The people no longer with the company were still showing on the audit report but, of course, showing no access. I believe the history of a user in a system needs to be maintained depending of the context of compliancy. For us as project builders, I think we need to clearly distinguish between "employee" and "consumer".

If anyone works in a corporate which uses SAP or Sun Systems, Oracle etc they might know better what is the view on the "employee" side of the users.

Also

Article 23 calls for controllers to hold and process only the data absolutely necessary for the completion of its duties (data minimisation), as well as limiting the access to personal data to those needing to act out the processing.

accounts-facebook, accounts-google etc should possibly need to be adapted to inform the developer to limit the data pulled from the social networks unless it is really of use. For instance prevent getting Location, Gender, Friends if they are not really used. Some developers might be of the mentality "let me pull as much as possible just in case I will need it one day".

mitar commented 6 years ago

accounts-facebook, accounts-google etc should possibly need to be adapted to inform the developer to limit the data pulled from the social networks unless it is really of use. For instance prevent getting Location, Gender, Friends if they are not really used. Some developers might be of the mentality "let me pull as much as possible just in case I will need it one day".

To make this easier, I think those packages should be updated so that you can ask for little amount of data, but then also request more and re-authenticate. I do not know if it is currently even possible to maintain different levels of access for different users.

For example, on GitHub there is a common use case where users could decide to give you access to only public repos or to all repos.

paulincai commented 6 years ago

Right to Access Part of the expanded rights of data subjects outlined by the GDPR is the right for data subjects to obtain from the data controller confirmation as to whether or not personal data concerning them is being processed, where and for what purpose. Further, the controller shall provide a copy of the personal data, free of charge, in an electronic format. This change is a dramatic shift to data transparency and empowerment of data subjects.

Data Portability GDPR introduces data portability - the right for a data subject to receive the personal data concerning them, which they have previously provided in a 'commonly use and machine readable format' and have the right to transmit that data to another controller.

https://themeteorchef.com/tutorials/exporting-data-from-your-meteor-application

@dr-dimitru how do you see this as far as remote media or other files are concerned? I am just curious now, I am an european Kardashian ... how I can get my data from Instagram Europe :)))))))

dr-dimitru commented 6 years ago

@paulincai

Right way:

  1. Your service should have servers (data storage) in all regions it operates;
  2. European users are having experience meeting European law, and their data stored in EU (same applies to other regions);

Wrong way: 1. Keep servers in a country with weak data-protection law; 2. Register domain name in a zone with weak abuse rules; 3. Register legal entity... should I continue?

Consumer/user trust - requires a lot of work.

StorytellerCZ commented 6 years ago

With GDPR putting your server outside of the EU won't help you as it applies to you the moment you get an EU user.

paulincai commented 6 years ago

@dr-dimitru

interesting context: store user data in Amazon with mirroring in multiple regions for high availability. Same for Mongo DB.

The architecture design is now driven towards regional host holds data for 1 region only without spill in other servers in other regions. The data host closely follows the administrative borders, at least in Europe.

StorytellerCZ commented 6 years ago

I think once modernization is of accounts packages (https://github.com/meteor/meteor/pull/9558) is finished we can start working on this issue. Hopefully within a month or so that there is enough time for inclusion before GDPR comes into effect in May.

dr-dimitru commented 6 years ago

@StorytellerCZ meanwhile we can prepare requirements to satisfy GDPR, including suggestions from comments above.

StorytellerCZ commented 6 years ago

@dr-dimitru Agreed. So far I see it like this:

Delete functionality Something like Accounts.deleteUser or Accounts.deleteAccount. It will take in the userId. It will logout the user and add a deletedAt param to the collection with the current date. This would disallow any future logins. Any additional handling should be done in the delete hook so that it can be adjusted according to the needs of the app.

onUserDeleteHook Will receive the user object. Any changes that the app needs to happen can be done here. Returning true will delete the user object from the collections that Meteor controls, while false will omit this assuming that the hook has taken care of it.

Export user data (Right of Access & Data portability) Something like Accounts.exportUserData will take the user object without the password and other login info (maybe Facebook token?), send it to an export hook and that hook then should return a unified object with all the data related to the user.

Citizenship country or check for EU citizens We haven't really gotten this far, but I think this is needed as EU can request numbers of EU users and other EU related stuff so either a country string param or a boolean param that would indicate EU citizens. I personally lean to a string param with a country code as that can be more versatile. An additional function to count the number of users from EU countries could also be added.

accounts-ui package and tutorials The above changes will have to be reflected in the accounts-ui package. The question is how much should it be reflected in tutorials or if the this new functionality should be restricted to a new chapter in the guide (or expanding existing on accounts).

Encryption Is there any encryption or other security features that we can turn on by default?

dr-dimitru commented 6 years ago

@StorytellerCZ

Agree on the most, perhaps method and hook names should be somehow standardized. I would comment on that three statements:

Export user data (Right of Access & Data portability):

This can be moved to the tutorial, as its precise requirements differ from app to app. Does GDPR define format for exported data, or it can be represented as a record from DB as plain JSON?

Citizenship country or check for EU citizens:

This can be moved to the tutorial. I assume it's should be implemented as middleware via WebApp.connectHandlers.use, again each app has it's own very specific requirements.

Encryption:

Can't provide expertise on this one.

StorytellerCZ commented 6 years ago

@dr-dimitru In regards to the data exports, I haven't seen anything mentioning a common format. Though there was a mention that the theory behind this rule is that you can take data from one service and give it to another. I would put this to classical bureaucrats thinking about tech and as usual not knowing a thing about it. For myself, I will just export a json object with the user's data. So having a hook that will allow the app to handle it according to their data I think makes the most sense. Or we can just add a sample implementation into the guide.

As for other encryption by default and other security features, I think @benjamn or @abernix will know more.

abernix commented 6 years ago

I'm not familiar with what the exact requirements are or would be so I'll just provide pointers to what I know off the top of my head:

I think you'll want to explore the customizability introduced in https://github.com/meteor/meteor/pull/9044, though #55 may also be to your interests.

StorytellerCZ commented 6 years ago

@abernix That is helpful. The thing is that from my reading of GDPR any security/encryption options have to be on by default, so this goes beyond the accounts package as well. But that is probably something we will have to put into a section in the guide.

smeijer commented 6 years ago

Though there was a mention that the theory behind this rule is that you can take data from one service and give it to another.

I'm not sure if that's the real idea. I don't know how it's defined in the GDPR. But I do know we have a law like this here in the Netherlands.

The law here exists so that I as a citizen can request all the data from an organization that somehow relates to me.

This to check what they are collecting about me. And to give me ground to protest against unwanted data collections, or stop doing business with them.

The companies / organizations are obligated by law to provide me this data, when I request it.

This law is named "Wet bescherming persoonsgegevens" short WBP (law for the protection of personal information).

The Dutch WBP is based on European regulations (95/46/EG).

The information they are obligated to provide me:

StorytellerCZ commented 6 years ago

@smeijer Probably the same idea with GDPR as well then, but that is an irrelevant point. The question is how much and how the export functionality should be in Meteor and in what style.

paulincai commented 6 years ago

A very detailed insight on the matter: https://www.smashingmagazine.com/2018/02/gdpr-for-web-developers/

StorytellerCZ commented 6 years ago

Found another article that spells out all the technical aspects: https://techblog.bozho.net/gdpr-practical-guide-developers/

StorytellerCZ commented 6 years ago

So after an official governmental workshop, things are a bit more difficult in that it really depends on how TOS and other legal documents are written. Which makes this issue a bit easier. For example, the delete user functionality should just add a deletedAt flag to the user object and expose a hook that can do more if needed by the application (also if the field is present it should prevent any login attempts to that user showing user not found error). As such I have reduced the OP to just list what I think is the minimum.

dr-dimitru commented 6 years ago

@StorytellerCZ

the delete user functionality should just add a deletedAt flag to the user object and expose a hook that can do more if needed by the application (also if the field is present it should prevent any login attempts to that user showing user not found error). As such I have reduced the O

While this will act as a user isn't exists, at the same time user wouldn't have an option to sign up again using same credentials (email address for example).

StorytellerCZ commented 6 years ago

@dr-dimitru Yes. This can be taken further by deleting e-mails, service connection and password. This is where the delete hook will be essential as for example in my app I will need to store the e-mail address for archival purposes in some cases due to copyright law.

StorytellerCZ commented 4 years ago

I would like to bring this topic back into spotlight as today I have noticed that Facebook created Data Deletion Requests callback, following the longstanding deathorization callback (and I also remember seeing data request callback, but now can't find it). Hence I think that the accounts package could be made more robust if it would be extended by the capability to handle these callbacks. By default all services would have the deauthorization/logout callback which would just remove the authorization token from DB. Additional callbacks could be defined on service to service basis so that all the use cases can be covered. Obviously for some there would be needed additional callbacks into the app in order for them to be effective. For example request for data would by default return (in json) basic info about the use and data stored from that service, then there would be a hook/callback where each app could append additional information or override completely what is being returned. For deletion callback that would fail by default unless the app would have a function to call for that.