Closed roland-d closed 5 years ago
What a shame that the compliance team didnt test the privacy component during the very extensive period that it was under development and/or while it was in beta.
User data can also be rebuilt if you review the server logs or if you have backups of your data.
As for clearing the action logs. If that is a concern for the site owner then the action logs plugin already offers the functionality to delete the logs after x days.
As for if it is legal to keep or remove certain data - that can only ever be upto the site owner to determine and any software that claims to be able to do everything automatically is simply not telling the truth.
What a shame that the compliance team didnt test the privacy component during the very extensive period that it was under development and/or while it was in beta.
What a shame that assumption is made. I have tested the privacy component when it was in beta. More than once. This just never occurred to me until people started asking questions about it.
So what is the use of the deletion function if the data is kept in the custom fields?
Blindly deleting all action logs that match a user ID is not a good suggestion. Depending on who hooks the system and what is being logged, some logged actions may be deemed information that needs to be retained (i.e. if ecommerce extensions suddenly decided to use core tools instead of their own systems and they log messages indicating a user made a transaction at X time). So log deletion should be delegated to plugins.
And I'm going to have to agree with Brian on the fact that this is the type of in depth feedback and case study we were trying to get 6 months ago before I got so burned out by working on the project since it appeared there was little community support outside the usual "+1 this is awesome" feedback that I all but wiped my hands clean of it and left it in a "take it or leave it state".
@roland-d I referred to the team not the individual
an RFC #25849 for the user custom field , maybe the compliance team can have a look ?
Thank you @alikon. We will test it and share our feedback on RFC #25849 asap.
Introduction
This issue is twofold. First it is meant as a discussion on how the deletion should work in the privacy tool suite and second a clear guideline on what needs to be implemented in a possible pull request. This has come forth out of the checking of sensitive data by the compliance team and found that not all sensitive data is removed currently.
The problem
User information can be reverse engineered based on data retained in the logs and user related custom fields that are not included in the deletion or anonymization functions after a removal request.
Prerequisite
Action logs and Privacy User suite is in-use.
Rebuilding a user account
A user has requested to be removed and the user is deleted as a result.
Now we can still rebuild the user after deletion using the logs.
Step 1: Open the request for removal. This has the following information:
This tells us that user with ID 826 used to have email address zohermakika@mailinator.net
Step 2: Open the Users Action Logs. Here we can filter on the user, in this case User ID 826
Now we know that the user used to have username zerukuduwe.
Step 3: We analyze the logs further and see that just before the user logged in, the admin user updated a user. We can hover over the name to see the URL which user it will open.
We will see that Whitney Cameron opens the URL /administrator/index.php?option=com_users&task=user.edit&id=826
This is ID 826, exactly the ID we are looking for. So by opening the user page we now have the following information:
Use cases based on the problem
The user custom fields are for many websites, especially those that are being used to provide to users with the ability to create their profile, the way to associate users with their unique characteristics, their additional personal data or/and other sensitive data. Examples could be: a clinical website, where users registers their profiles and also fill in several custom fields due to the scope of the website, i.e. their health condition, symptoms, their location, their doctor name or the associated to them clinic etc. to receive useful services or consultation.
an educational website where users also register additional information regarding their profiles through the custom fields, i.e. location, school names, educational preferences, scores etc. in order to get more personalized services and support. Additional examples of websites could be a job/recruiting related websites, caregivers websites etc. In general websites where users register their more than the usual profile personal data.
Proposed solution
When a user is requesting his removal that leads to the deletion or anonymization of his/her data, someone would expect that not only his/her profile (Joomla! user account data) will be deleted or anonymized, but also all the associated to them personal data in the custom fields. In addition, any logging data that relates to this user should be removed so the user account cannot be reverse engineered as shown above.
The need for a more efficient deletion/anonymization plans and actions
In some cases, not all of the custom fields are useful or/and legal to be included to the deletion or anonymization functions, as in some case are needed by an organization for anonymous future reportings or any other processing based on a legal basis. An example could be a portal where users purchase gifts based on their activities and/or their points. In that case it is more than possible that, except of the custom fields, that are associated with their unique personal data i.e. their home address (not just the city), there will be custom fields that are presenting i.e. their ratings for the service, how did they found the service (i.e. by a friend, through a web search etc.) or any other useful data that can stay anonymized and not associated to their personal data for statistical analysis.
The final proposal
It would be more efficient to have the ability to map and include to the deletion or anonymization requests/functions only those custom fields that are aligned with the Privacy Policy of each website. A new option would be added to a custom field whether or not it contains information that can be considered personal data that based on the website’s privacy policy should be removed if a user submits a removal request that leads to the deletion of their associated information. In this way we know which fields should be removed and which ones could be kept.
During deletion we also empty the action log entries for the given user. Ensuring full deletion of the users data that is not mandatory to be kept.
The PR is the result of a collaboration between: Achilleas Papageorgiou Alkaios Anagnostopoulos Roland Dalmulder Sandra Decoux
Background work and motivation
The motivation of this proposal was a number of tests that Alkaios Anagnostopoulos performed during the preparation of their presentation with Achilleas Papageorgiou under the title: “Privacy: a fundamental feature in web application development” in JoomlaTalks 2019 (Athens, Greece). At the latest meeting of the Compliance team, Achilleas shared with Roland Dalmulder the results of Alkaios who made tests and validate them. Achilleas, Roland and Sandra Decoux discussed the potential impact of the deletion or anonymization. Based on this discussion Achilleas proposed a solution for a more flexible management through the mapping of the custom fields and the ability for the Super Users to include/exclude the custom fields they need to the deletion/anonymization functions based on the privacy policy they follow.