joomla / joomla-cms

Home of the Joomla! Content Management System
https://www.joomla.org
GNU General Public License v2.0
4.73k stars 3.64k forks source link

[com_privacy] A user is removed, but his consents still have Personal Data in them #22720

Closed PhilETaylor closed 5 years ago

PhilETaylor commented 5 years ago

Steps to reproduce the issue

Joomla 3.9 with all new features enabled

User "me@phil-taylor.com" registers User "me@phil-taylor.com" consents to the privacy policy in his profile

User "me@phil-taylor.com" requests a REMOVE request He Confirms it with a token from the email The admin sees this, and clicks the X to delete all his data The admin sets the status to COMPLETED

Expected result

All Personal Identifying data is removed from the consent

Actual result

The username is obfuscated The user id remains The IP address and the user agent of the browser is left in the body of the consents

brianteeman commented 5 years ago

Its a necessity of the regulations. Otherwise you won't be able.to.confirm that a user requested to be removed and when they requested it etc

PhilETaylor commented 5 years ago

Rubbish.

A request under GDPR to delete your personal data should be exactly that. DELETION of your data.

This is why people in the real world are using Blind Indexes. No one stores personal data, to prove they have deleted personal data! thats plain stupid!

We have been using blind indexes for years for proving to authorities that we have taken actions requested.

For example.

When a GDPR Remove request comes into myJoomla.com we DELETE ALL DATA for that user, EVERYTHING. But then a record is created in the blind hash table. A hash is taken (say of their email address me@phil-taylor.com) and this is hashed (and encrypted) and stored along side notes (which are also encrypted), where the notes have sentences, minus personal data, like

"This user requested on 2018-01-01 10:00 for us to remove their data, we honoured this request on 2018-02-02, this was a long term customer with over 1000 sites in their account. German. etc... "

Then in the future we can prove we have deleted their data because they will give us information like "prove you deleted all data for me@phil-taylor.com"

We enter me@phil-taylor.com into the blind hasher, it gives us a value, we can then look up the notes in the database which tell me who, when, where, why and prove our actions - we are still not storing any personal data in the notes and the hash of their email address is done in a one way cryptographic method.

This is a fork of the idea described as Blind Indexing by Paragonie here https://paragonie.com/blog/2017/05/building-searchable-encrypted-databases-with-php-and-sql

brianteeman commented 5 years ago

Don't like it, then change it but don't just sit there and call people's work "rubbish"

ReLater commented 5 years ago

we are still not storing any personal data in the notes

I don't understand. These are no personal datas from your point of view? A combination of

?

PhilETaylor commented 5 years ago

What I'm calling "rubbish" Brian is your assumption that you need to "retain personal data" to "prove you have deleted personal data" - There is nothing in the GDPR that agrees with that, when it says delete - it means complete and utter destruction of the data. There are OTHER reasons to retain data lawfully, but not for "proving" you deleted the data. I'm not calling peoples work "rubbish" and that is very clear in my post. Once again you make this about personal attacks.

Personal data should, where ever possible, be encrypted at rest under GDPR, nothing in Joomla is ever going to implement that (from what Babker has already said) and so all this talk about Joomla complying with GDPR is rubbish anyway, a database compromise is always going to leak personal data. Joomla could encrypt personal data, it choses not to, because, well, crap servers all the way back to PHP 5.3, and the technical level of its "Super Admins"...

What we have in Joomla is just scratching the surface of compliance to the GDPR.

What you are referring to (proving you deleted data) is from outdated UK Data protection legislation.

@relater - I consider email personal data. In my solution I am NOT storing the email, but a one way, cryptographic hash of the email. This cannot be reversed back to the email, but if someone comes back in 5 years and gives me the email address, I can hash it the same way to get the same hash I have stored, this allows me to look up the notes.

The notes have ZERO PERSONAL IDENTIFYING DATA (As defined under GDPR) and is very generic. From the notes alone it would be impossible to connect the notes to any person on earth.

In this way, I can prove that I have processed a request for a certain email, when, and some basic information, I can do this at any time in the future ONLY IF someone gives me the keys - which is the users email address. The email address is NOT stored in the notes and is only used to generate the hash which acts as the row key.

We choose to further encrypt the notes, therefore if our db is compromised the notes are in an encrypted state at rest (another GDPR requirement for personal data, but we apply that to the non personal data in our db where ever possible)

Other random people agree with me

Im not going to argue on legal points - Im not a lawyer, I have read the full GDPR regulation (what a weekend that was) and the UK Data Protection Acts (and the Jersey ones for that matter) and read extensively on the subject. I have received and processed GDPR Removal Requests. I have proved that the system I have implemented allows me to prove I deleted data in the future.

The fact is that we have literally no reason to retain personal data after a user has requested their personal data is removed. Leaving an IP address and User Agent in plain text, unencrypted in the database, linked with a user id number, allows you to track and trace that users actions, and in the event of a database breach, leaks personally identifiable information (and IP address/User Agent) to many other things in the database (Action logs)

"Delete means Delete"

mbabker commented 5 years ago

Joomla could encrypt personal data, it choses not to, because, well, crap servers all the way back to PHP 5.3, and the technical level of its "Super Admins"...

Not just that...

All of Joomla is in the web space.

You would have to store all data regarding data encryption (and decryption) in the same space as everything else related to the site's configuration (i.e. database credentials).

I believe you yourself at one point (or someone on JSST if not you) said something along the lines of doing this is as good as not encrypting the data at all because if you find rogue filesystem access you still get the keys to everything.

PhilETaylor commented 5 years ago

Yup, I said that, Joomla Super Admins don't have the technical knowledge to implement encryption to a threat model that is actually useful. storing of the secret key for encryption is the limitation here. And as many of the hacks in Joomla involve SQL injection this kindof gets instantly defeated anyway.

But when you stand up in an EU court and say "Sir, my database has zero encryption and there is where we store all our users shopping orders of sex toys, and their dating profiles, and a hacker has exposed all our data"... you can be sure that the lack of basic technical measures to securing the data will burn your business badly...

This thread is about the fact that an IP address and Useragent is not deleted from the database when the user requests that you delete their Personally Identifiable Data. Its black and white. There is no reason (legal or technical) to store that information in that table when a user has specifically asked for that to be removed.

mbabker commented 5 years ago

you can be sure that the lack of basic technical measures to securing the data will burn your business badly...

You just made the case for why no business can build their web presence off of a mass market content management platform. We can all go home now.

PhilETaylor commented 5 years ago

And that my friend, is why all this work on privacy features needs to be marketed as just that "some basic privacy features" and not a GDPR Compliant Mass Market CMS.

alikon commented 5 years ago

who have ever claimed that the 3.9 is GDPR compliant ?? even in marketing material is " privacy tools"

PhilETaylor commented 5 years ago

Quite frankly its all summed up on Article 25 - https://gdpr-info.eu/art-25-gdpr/

Taking into account the state of the art, the cost of implementation and the nature, scope, context and purposes of processing as well as the risks of varying likelihood and severity for rights and freedoms of natural persons posed by the processing, the controller shall, both at the time of the determination of the means for processing and at the time of the processing itself, implement appropriate technical and organisational measures, such as pseudonymisation, which are designed to implement data-protection principles, such as data minimisation, in an effective manner and to integrate the necessary safeguards into the processing in order to meet the requirements of this Regulation and protect the rights of data subjects.

The controller shall implement appropriate technical and organisational measures for ensuring that, by default, only personal data which are necessary for each specific purpose of the processing are processed. 2That obligation applies to the amount of personal data collected, the extent of their processing, the period of their storage and their accessibility. In particular, such measures shall ensure that by default personal data are not made accessible without the individual’s intervention to an indefinite number of natural persons.

Highlights are mine.

With regards to deleting of data, "Data protection by design and by default" means actually deleting that data when asked to

@alikon I never said that someone did :-) #wasjustsaying

PhilETaylor commented 5 years ago

Anyway thread drift.

Brian believes that it's perfectly acceptable to keep a users IP address and their User Agent in the consent log indefinitely. I disagree. In the absence of any other opinions on this topic (apart from the thread drift to a general GDPR discussion) I see no reason why this issue cannot be closed with status "Brian said this retention of data is acceptable".

It's clear that lip service is being given to the law with these new privacy features, but actual adherence to the GDPR is a wish too much for a mass market software like Joomla.

Therefore, with the closing of this issue now, Joomla will retain consent data, with Personally Identifiable Data after a data subject has requested removal...

mbabker commented 5 years ago

Pretty much that. It’s a toolkit (that like most core things will be ignored if I’m being blunt honest, just look how little of core many vendors use) to help with actions that fall under guidelines of privacy laws like GDPR, but it’s not a 100% complaint solution with any law. Best we can do is continue iterating to improve status quo with the understanding that there are some technical things we can’t do in core because they just are ineffective with our distribution model.

On Fri, Oct 19, 2018 at 8:08 AM Nicola Galgano notifications@github.com wrote:

who have ever claimed that the 3.9 is GDPR compliant ?? even in marketing material is " privacy tools"

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/joomla/joomla-cms/issues/22720#issuecomment-431357136, or mute the thread https://github.com/notifications/unsubscribe-auth/AAWfoVdZHSZhoTkC9vosKxqgLHXMDx7yks5umc7AgaJpZM4Xvf7r .

--

  • Michael Please pardon any errors, this message was sent from my iPhone.
brianteeman commented 5 years ago

Reminder - this is not a GDPR plugin

PhilETaylor commented 5 years ago

And nor is a "Remove request" actually removing your personal data....

mbabker commented 5 years ago

I'm reopening this as it is a valid issue to keep on the books and continue looking for ways to improve upon.

We are not in a spot where core can do this hash table stuff, short of SHA1'ing a string with no salt or other data (i.e. JFactory::getConfig()->get('secret'), and I think most of us would know why using the secret for a one-way hash would be a bad idea). Which means at the moment we are left with two practical options:

Both options suck at face value, surely there is something better than "just ignore it" that can be done here.

csthomas commented 5 years ago

If you can create a hash from an email address and then the email address will be secure, then use crypt(IP, UserAgent, date, ...) using JFactory::getConfig()->get('secret') and salt=email address.

If you know the email address then you can encrypt details (IP, user agent, date, ...).

mbabker commented 5 years ago

I still wouldn't use the secret in any hashing or encryption operation. If the secret changes for any reason (I may be misremembering, apologies if I'm mis-speaking, but Akeeba at one point was changing the secret when restoring a backup through Kickstart), it invalidates anything hashed or encrypted using that value.

Even without that issue, you're in a situation where Joomla has to be smart enough to generate encryption/decryption keys and store them somewhere, because the bulk of our users aren't going to know how to do that on their own. If we're going to do encryption, we should use our Crypt API, but we're stuck trying to use ciphers that are consistently available, and the only adapter we can rely on for that is the Sodium adapter thanks to the sodium_compat polyfill (mcrypt being deprecated/abandoned and we don't have an OpenSSL adapter in the CMS). But, that leaves the security issue of the encryption keys are going to be in your web space right alongside the rest of your website, so on a filesystem compromise that data encryption isn't doing you any good because your database details are in a readable configuration file and those encryption keys are going to be in a predictable location and thanks to the open source nature of the project it wouldn't take very long to load those keys up and decrypt anything in the database using those keys.

So it's not that we can't do it from a technical perspective. From a practical perspective though, it's not something that makes a lot of sense without some out-of-the-box thinking, at which point I'd suggest it probably only benefits code savvy individuals.