Add an option to disable fingerprinting / config_id entirely

MoritzLost commented 2 years ago

Update - the specs

Fix in 5.0.0 - See the proposed solution in https://github.com/matomo-org/matomo/issues/18448#issuecomment-1512338767
Ideally we would backport but it might be OK in that case not to backport

Summary

This is continuation of #16361 and the related discussion in the forums. Right now, the stance of Matomo is that in cookie-less mode, it doesn't use fingerprinting so it doesn't require consent. Under GDPR this line of thought is perfectly sound and reasonable. However, under the ePrivacy directive (which is now implemented in German law since 1 December, 2021 with the TTDSG), this is once again a debatable point.

The key phrase in the ePrivacy directive is the 'access to information' already stored on the user's device. Of course, it's not entirely clear what this means in practice. But call it a fingerprint or not, Matomo does access some data from the user's device to create the config_id (in particular, the screen size and supported mime types). It's really up for debate if this constitutes 'access to information' – but as long as there's no legal precedence, some clients will want to cover all their bases. So it will once again be necessary to require consent before using Matomo, even without cookies or 'real' fingerprinting in place. So having an option to disable the device detection further would be great. Instead, the config_id could be based solely on User-Agent and anonymised IP address (which the server receives implicitly, so no data access is required).

Of course, this would further limit the usefulnes of some reports. But this would be acceptable in order to be 100% sure that using Matomo without consent is completely safe, legally speaking.

tsteur commented 2 years ago

Thanks for this @MoritzLost

When this new feature is disabled, what would you expect to happen? What information would be accessed vs not accessed? Or would it only use the IP address (meaning when different people in the same company visit the site then all these people are group into the same "visit")? Or maybe even every single action would create a new visit and not even the IP looked at?

Be great to hear your thought we may consider to work on this.

The biggest problem would be to customise the tracking code based on this setting which could be quite hard.

tsteur commented 2 years ago

The key phrase in the ePrivacy directive is the 'access to information' already stored on the user's device. O

Do you maybe have a link to the page where this is mentioned?

MoritzLost commented 2 years ago

@tsteur Thanks for the reply!

Do you maybe have a link to the page where this is mentioned?

This comment on issue #16361 has more information:

If Art. 5 para. 3 ePrivacy Directive applies consent is mandatory to proceed. This law does not refer to cookies. It refers to "the gaining of access to information already stored, in the terminal equipment of a subscriber or user". The most relevant question is whether Javascript Tracking means gaining access to information already stored in the enduser's device. Art. 5 para. 3 ePrivacy Directive describes one scenario when consent is not required. This is the case if access to the enduser's device is "strictly necessary in order for the provider of an information society service explicitly requested by the subscriber or user to provide the service". This exemption does not cover analytics because no user or visitor "explicitly requests" to analyze his website or app usage. The relevant publication regarding this matter is Opinion 09/2014 by the Article 29 Working Party on device fingerprinting. The Opinion states under 7.1: "first-party website analytics through device fingerprinting do not fall under the exemption defined in CRITERION A or B and consent of the user is required."

In the TTDSG, the German implementation of the ePrivacy directive, there's § 25 para. 1, which is pretty much a literal translation of Art. 5 para. 3 ePrivacy Directive.

When this new feature is disabled, what would you expect to happen? What information would be accessed vs not accessed?

It's a bit difficult to say what information can still be used without consent based on the ePrivacy Directive / TTDSG, and I'm a developer, not a lawyer. But I've read multiple blog post and 'expert opinions' (for example this one in German). The consensus seems to be that everything that is implicitly available to the server as by-product of the HTTP request is 'safe' to access. This would be, at most, the IP address and User-Agent. Though not even that is completely certain, since the request to Matomo was not 'explicitly requested' by the user, so it's not 'strictly necessary' (quoting the relevant phrases from the ePrivacy Directive here).

That said, I'm more concerned with the data Matomo collects using JS (screen size, mime types). Of course there's much ambiguity here, but that could be considered 'access to information already stored' on the user's device. So an option to disable that would ideally leave that part out entirely, so the server only gets the IP Address and User-Agent (and any other 'required' HTTP headers that might be useful) to create the config_id.

Or would it only use the IP address (meaning when different people in the same company visit the site then all these people are group into the same "visit")? Or maybe even every single action would create a new visit and not even the IP looked at?

Either of those could be reasonable. If the config_id is created only based on User-Agent and IP address, this would lead to a lot of collisions, in particular if the IP addresses are anonymized (which they need to be for GDPR reasons). Though it could still work well enough for small and medium sites.

Maybe the distinction between visitor, visit and page view could be dropped entirely in this case? This would greatly reduce the usefulness of some reports (bounce rate, session length etc), but that would be a tradeoff to consider. Cloudflare Web Analytics does something similar. They only distinguish between visits and page views. A unique 'visit' is recorded if the web request came from a different website (determined by the HTTP Referer header). Of course this leaves a wide margin for error, but it's still accurate enough to get basic visitor counts and page view trends. Though I'm aware that this would be a massive rework.

As another option, disabling some reports that just don't work if the config_id is not reliable would make sense. Or even just adding warnings to the interface where reports might be skewed by that fact? Based on my interactions with clients, having reduced accuracy of reports is acceptable if it means we can use Matomo without requiring consent. Those clients only need to be able to tell which reports are reliable in that mode and which are not.

Getting some opinions from other people here would be great, in general the situation is pretty murky right now. Maybe the best course of action right now is to wait for some legal precedence / test cases which will clear up what consitutes 'access to information' …

tsteur commented 2 years ago

@MoritzLost thanks for this. I will have another read and think later. Will see if we can maybe get more eyes on this through the forum or so.

Just wanted to already mention that we're working on a feature to disable some things when data is less accurate when we don't have reliable config_id etc see https://github.com/matomo-org/matomo/pull/16773

tsteur commented 2 years ago

Also just FYI as a "workaround" someone could technically fall back to log analytics (Log analytics on GitHub) if eg events or other features aren't being used. Of course that might not always be the case and maybe "Opt out" could be partially problematic.

tsteur commented 2 years ago

I guess generally it be mostly about disabling this code to run Matomo in a way to not use this data for the fingerprint if someone wanted. The most unique thing in there is likely the resolution as plugins are often quite similar. So disabling this would actually not make a huge difference. Generally, we could provide a tracker option likely to not send this data along with a tracking request (and ideally also to not even access it which should be the case automatically when not using cookies or cross domain linking feature which won't be useful without cookies anyway as far as I remember).

FYI I've contacted our data protection officer to potentially get some insights into this. It may take a while until we hear back.

MoritzLost commented 2 years ago

@tsteur Thanks! Yeah, would be great to get a some more opinions for this.

Just wanted to already mention that we're working on a feature to disable some things when data is less accurate when we don't have reliable config_id etc see #16773

That sounds promising, maybe this would pave the way to disable the feature detection completely without messing up the reports too much. Having reduced accuracy is very acceptable if the interface clearly communicates what it entails, like hiding reports that don't work in this case.

Also just FYI as a "workaround" someone could technically fall back to log analytics (Log analytics on GitHub) if eg events or other features aren't being used. Of course that might not always be the case and maybe "Opt out" could be partially problematic.

Yeah, log analytics is always the final fallback option, it doesn't require consent or opt-out at all (as long as IP addresses are anonymized). Though they're just not as reliable as JS-based trackers.

I guess generally it be mostly about disabling this code to run Matomo in a way to not use this data for the fingerprint if someone wanted. The most unique thing in there is likely the resolution as plugins are often quite similar. So disabling this would actually not make a huge difference. Generally, we could provide a tracker option likely to not send this data along with a tracking request (and ideally also to not even access it which should be the case automatically when not using cookies or cross domain linking feature which won't be useful without cookies anyway as far as I remember).

Yeah, the browser feature detection is definitely the main point. An option that just skips calling this function would be great, this should clear up any doubts regarding the 'access to information' clause. Or maybe only the screen size could be used? Arguably, the screen size does not constitute 'information to access already stored'. Though that's still up for debate, it really comes down to legal precedence I guess …

FYI I've contacted our data protection officer to potentially get some insights into this. It may take a while until we hear back.

Thanks, would be great to get an expert opinion on this specific use-case!

tsteur commented 2 years ago

Just fyi @MoritzLost I read https://www.heise.de/hintergrund/Was-sich-mit-den-neuen-Cookie-Regelungen-aendert-6278440.html today and it sounded more like that maybe consent is needed when you previously store data, and then read this data. As I think what's maybe meant by it if you don't store a cookie, but you store an identifier in localStorage or sessionStorage or so and want to read this. I don't think what's meant by it is the resolution or plugin features. I'm still waiting though to get more feedback likely mid next week.

MoritzLost commented 2 years ago

@tsteur I'm not sure I find that article convincing – focussing the whole discussion on cookies (or related technologies like localStorage) is a very myopic reading of the terms of the ePrivacy directive. The article basically states that the directive (and the TTDSG by extension) only talk about accessing information that you put there yourself – so cookies etc. But that isn't supported by the wording in the ePrivacy directive / TTDSG at all, which only mention access to information already stored [on the user's device], without mentioning what kinds of information this applies to. So it comes down to whether device features (screensize, plugins etc) can be considered 'information' in this sense.

Anyway, I would really like to believe that article, since it would make my life much easier … but I'm not sure that this interpretation is supported by the actual text of the law. Anyway, I'm really out of my depth here.

mattab commented 2 years ago

Hello, I haven't read this whole thread in detail, but here are my findings so far looking at the latest Eprivacy draft especially Article 8: https://github.com/matomo-org/matomo/issues/15425#issuecomment-993160031 - if you have any feedback or questions i'm keen to hear :key: I'm not sure yet how Article 5 and Article 8 cohabit or interact.

Daten-David commented 2 years ago

Thanks to @MoritzLost and @tsteur for the valuable input. I agree that this issue is completely arguable.

One question is whether the term "information" in Article 5 ePrivacy Directive and § 25 TTDSG covers all kind of data or needs to be read in a limited meaning like "information that has been created by human action". The consequence of this legal argument could be whether screen size and similar technical information are covered by the law or not.

The second question is whether the term "stored" means stored by anybody or by the person/organisation which wants to process the information. For example the applications and fonts installed on a device are information stored by human action on a device (and by some services they are used to create a fingerprint). But unlike cookies or data in the local storage of a browser storing such information has not been initiated by the external party running the analytics tool.

For Germany the Datenschutzkonferenz (DSK; the common body of the regional and federal supervisory authorities) has announced to publish an update to its guideline on web analytics. It should have been published already. By now it seems more likely to be published in February. But most likely the DSK will address the questions raised here. In general the DSK supports the approach by Matomo but their lawyers need to find common sense in interpreting the law.

tsteur commented 2 years ago

@MoritzLost here some update:

Even though there has been a lot of talk about sec. 25 of the new German Telemedia and Telecommunications Privacy Act (TTDSG), it is really only a (very late) implementation of the 2009 EU ePrivacy Directive, in this case its art. 5(3), into German law. It's nothing new basically. Meaning when we look at TTDSG we pretty much have to look at ePrivacy.

The consent requirement in ePrivacy Directive was written in a time when cookies were the main technology to track users online, but today, the 2009 law is interpreted to apply that all tracking technologies and not just to cookies.

However, there is an exception from the consent requirement in art. 5(3)(2) ePrivacy Directive (and in sec. 25(2)(2) TTDSG) for so-called "strictly necessary" tracking technologies, which therefore do not require consent. The French data protection authority CNIL has taken the position that this exception also applies to "lean" analytics providers (such as Matomo). This means that the consent requirement of art. 5(3)(1) ePrivacy Directive (and sec. 25(1) TTDSG) does not apply to Matomo because its tracking technology is considered "strictly necessary". Unfortunately, the German data protection authorities have remained silent on this very question, but it could be relied on the CNIL interpretation until further notice.

In the same spirit you could also rely on CNIL's decision and technically even use cookies without consent as long as you follow their conditions and don't track personal data (as then GDPR applies).

This is our interpretation here. Don't take it as legal advice.

Nonetheless, I think for people who want to have even stronger privacy, I think it be great to add a tracker method to disable the usage of browser features for config_id/fingerprint. It's quick to do and for low traffic sites it shouldn't have a huge impact.

MoritzLost commented 2 years ago

@tsteur I'd love to live in France 😏

However, there is an exception from the consent requirement in art. 5(3)(2) ePrivacy Directive (and in sec. 25(2)(2) TTDSG) for so-called "strictly necessary" tracking technologies, which therefore do not require consent.

Not to beat a dead horse, but the strictly necessary part is limited to a 'service explicitly requested by the subscriber or user to provide the service' (art (5)(3) ePrivacy). I'm not sure how to go from there to the CNIL statement – if a visitor visits a website, Matomo isn't 'strictly necessary' to provide the webpage. And the request to Matomo itself hasn't been 'explicitly requested'. Maybe it can be argued that basic analytics/visitor statistics are strictly necessary for ongoing support and development of a website?

Anyway, having an option like this would be great to be able to cater to extremely careful clients.

tsteur commented 2 years ago

@MoritzLost check out 50-52 https://www.cnil.fr/sites/default/files/atoms/files/lignes_directrices_de_la_cnil_sur_les_cookies_et_autres_traceurs.pdf

Translating this it says that it considers the use of attendance/performance statistics as required. That's if you eg don't also use the data for other purposes etc. Hence they exempt that you need to ask for any consent when you use Matomo in a certain way. Since they all implement the same European directive into national law you can argue the same applies to Germany or other countries until they take a different stance. This is after chatting with some expert lawyers etc. Someone might see this different though.

If you want to not look at any device data, then the above tracker method may help 👍

Daten-David commented 2 years ago

@tsteur and @MoritzLost

I don't know whether it helps but I am pretty convinced both of you are right. CNIL (in accordance with many lawyers) adopted a flexible approach for interpretation of Art 5 para 3 ePrivacy Directive (same as German § 25 TTDSG). Probably as many lawyers don't go with this interpretation.

Until the ECJ (European Court of Justice) had to decide on this issue nobody will legally know for sure. As far as I am aware there is no case pending on this matter already at the ECJ. So it might take at some years till we know for sure. Probably the ePrivacy Regulation as a replacement for ePrivacy Directive will be enacted at about the same time.

What will happen first will be the publication by the German authorities (and most likely by other national data protection authorities) of their guidelines. If the German guideline agrees with the French one we are all one major step ahead. If they don't agree...

If Matomo wants to step out of legal uncertainty it could go forward and limit access to all device data which is not sent into server-side analytics anyway. I am not aware of how much this move is going to limit the quality of the analytics results. Hence I don't know how painful such move might be.

tsteur commented 2 years ago

@Daten-David 👍 As part of this issue we'll add a new method that gives people the option to not detect these browser features and send it to the server. For smaller sites this shouldn't have too much of an impact typically (unless you get a lot of traffic from different people from the same company for example). For higher traffic ones there's maybe more often that different visitors may be grouped into the same visit but it shouldn't make that much of a difference I would expect.

Daten-David commented 2 years ago

@tsteur The DSK (meeting of the German supervisory authorities) published its guideline on § 25 TTDSG yesterday.

In German: https://www.datenschutzkonferenz-online.de/media/oh/20211220_oh_telemedien.pdf

An English version has been announced.

On page 8 the DSK points out that fingerprinting is considered access to information stored on the device and hence § 25 TTDSG is applicable to fingerprinting. The DSK doesn't provide a lot of arguments for this view but simply refers back to the old paper on device fingerprinting by the Article 29 Working Party which came to the same conclusion.

Different to the French view by CNIL the DSK doesn't accept web (or app) analytics as "strictly necessary" to provide a "service requested by the enduser". Hence consent is mandatory.

Looks like there is no way to keep fingerprinting via javascript active without collecting consent first.

And collecting consent doesn't become easier if you follow the guideline. I won't go into details but the guideline sounds very much like no consent management tool known to me today is capable of collecting valid consent.

The whole issue stays hot. And it looks like a big u-turn back to the 90ies of pure server-side web analytics.

tsteur commented 2 years ago

In the meantime, so-called browser fingerprinting is also often used Mission. This refers to the process of forming a server-side as possibleunique and long-lived (hash) value or image as a result of a mathematical calculation of browser information, such as Screen resolutions, operating system versions or installed fonts.

put one section through a translation service so other people can understand as well. FYI above one talks about a long-lived (hash) value. In Matomo, this hash is not long-lived and changes max every 24 hours. If visitor visits the site at 8am, then 16 hours later the hash changes already again. You could maybe argue that because the hash is not long lived it's not considered a fingerprint and it may be fine. Everyone will see this differently though. I know it talks further down more generic about it.

I'll move this issue into the current sprint and not next sprint so we can offer such a method to not access this data sooner.

Daten-David commented 2 years ago

Yesterday I missed an additional announcement by the DSK in its press release regarding the new guideline. DSK announced to initiate a public consultation on its guideline. Details will follow. See last paragraph of press release (in German): https://www.datenschutz-berlin.de/fileadmin/user_upload/pdf/publikationen/DSK/2021/2021-DSK-PM-OH_Telemedien.pdf It might be helpful if Matomo could provide arguments into the consultation which pick up the positive view of the CNIL. There might still be a chance that following the consultation the German DSK might swing to the more liberal CNIL view.

tsteur commented 2 years ago

Great, thanks a lot for pointing this out @Daten-David . We'd be keen to follow up there. In case we miss it, be great to mention it if you hear more about the details.

GreenReaper commented 2 years ago

As a user, this is all very confusing. I have a main server in France from a French company that proxies Matomo requests to a analytics server in the UK which hosts the Matomo software. Some of our users are from Germany. So do I have to worry about Germany privacy law interpretations on fingerprinting applying, even though I am in the UK and the server is in France? 🤷🏻‍♂️

justinvelluppillai commented 2 years ago

@peterhashair it'd be good as part of this issue to still add the test that browser features aren't sent to the server after the new method is called and a tracking request fired.

Also the documentation updates (at least two docs, I think maybe you've already added one).

Thanks

tsteur commented 2 years ago

And it be great to allow users to call a method enableBrowserFeatureDetection.

fyi @GreenReaper @Daten-David @MoritzLost an FAQ has been created for this new feature that will be included in Matomo 4.7

https://matomo.org/faq/how-do-i-disable-browser-feature-detection-completely/

@peterhashair fyi I tweaked the FAQ to make a few things more clear.

MatomoForumNotifications commented 2 years ago

This issue has been mentioned on Matomo forums. There might be relevant details there:

https://forum.matomo.org/t/compliance-adoption-to-the-requirements-of-the-25-ttdsg-german-law-option-to-avoid-the-use-of-screen-resolution-params-for-a-website-operation-without-consent-banner/46061/2

MatomoForumNotifications commented 2 years ago

This issue has been mentioned on Matomo forums. There might be relevant details there:

https://forum.matomo.org/t/question-from-data-protection-officer/47475/8

jmbiltresse commented 1 year ago

I'd like to continue the thread started by @MoritzLost. I definitely see his point because it's been a source of discussion and debate within our organization. While Matomo claims that one can run it without cookie consent, it is definitely not true in all EU member states because of article 5(3) of the ePrivacy directive that states the following: "Member States shall ensure that the storing of information, or the gaining of access to information already stored, in the terminal equipment of a subscriber or user is only allowed on condition that the subscriber or user concerned has given his or her consent, having been provided with clear and comprehensive information, in accordance with Directive 95/46/EC....". To generate its config_id, Matomo does access information from the user_agent string (IP, browser family, version, etc.) and some data privacy authorities (UK, Belgium, Austria, etc.) consider this a breach of article 5(3) of the ePrivacy directive because it's gathered without the user consent. It'd be great for Matomo to introduce a way to generate its config_id without collecting any of the device attribute (like using a random or unique number). I know it'd screw up some metrics like the number of visitors and other but it's a trade off. I can see this topic being a threat to Matomo. The two reasons we introduced Matomo in the organization was that it was privacy centric and could be ran without cookie consent. Given the later is not true in all member states, our IT team is now exploring alternatives, incl. CloudFlare analytics and other platforms.

MoritzLost commented 1 year ago

@jmbiltresse Have you looked at the new enableBrowserFeatureDetection option? This disables the fingerprinting completely, so there shouldn't be a problem. We use this alongside disableCookies, this way Matomo neither sets cookies nor accesses existing data / device information, so we can run it without consent because no PII is collected.

jmbiltresse commented 1 year ago

@MoritzLost I do not believe the disableBrowserFeatureDetection option disables the fingerprinting completely. My understanding is that it will stop collecting the browser resolution and browser plugins to make the config_id but will still need information from the user agent string like the IP, browser version and family to compute the config_id. Some data privacy authorities consider that accessing information from the user_agent string for the sake of analytics is a breach of Article 5(3). I know the disableBrowserFeatureDetection option satisfies the data privacy authority in Germany but likely not the Belgian, UK and Austrian ones. Let me know if I've misundertood you. Thanks!

jmbiltresse commented 1 year ago

Adding @tsteur for visibility and possible feedback. Thanks!

tsteur commented 1 year ago

disableBrowserFeatureDetection disables the browser features indeed. It won't detect any plugins or resolution and won't use it in the fingerprint. Matomo then still uses the user agent that is sent along the request to build the config_id, but because this data is sent with the request and not read on the user's device, this should not be causing any issues. Happy to discuss further.

jmbiltresse commented 1 year ago

@tsteur that is what we thought as well, but the lawyers we have hired, to consult on the topic, had a different read on Article 5(3) of the ePrivacy Directive and its application in UK, Belgium and Austria :

"Member States shall ensure that the storing of information, or the gaining of access to information already stored, in the terminal equipment of a subscriber or user is only allowed on condition that the subscriber or user concerned has given his or her consent, having been provided with clear and comprehensive information, in accordance with Directive 95/46/EC, inter alia, about the purposes of the processing."

Their read / interpretation is two-fold:

(1) The user-agent pulls information from the user operating system. By collecting information from the user-agent, you are technically accessing information already stored on the end user device. (2) Collecting information from the user-agent, to generate a config_id for analytics purpose, is diverting the user-agent from its use, something a user should be informed and consent to.

Of course, it's their own interpretation, but our organization heavily relies on their assessment and starts questioning the use of Matomo here. I am sure the same applies to other analytics platform like Plausible and Fathom. Thought I'd raise to you in case you'd like to brainstorm ways to generate a config_id not based on any device attributes.

Thanks!

Jean-Michel

tsteur commented 1 year ago

@jmbiltresse first of all I will reopen this issue again.

On " storing of information, or the gaining of access to information already stored, in the terminal equipment".

I'm referring here to "3.1. Darf ich Werkzeuge zur Reichweitenanalyse ohne Einwilligung der Nutzenden verwenden?" in https://www.baden-wuerttemberg.datenschutz.de/faq-zu-cookies-und-tracking-2/#12_was_bedeutet_speichern_oder_auslesen_von_informationen_auf_dem_endgeraet where information that is sent along the request may be used without consent.

Of course this only applies to 1 of the 16 states within Germany. The CNIL in France has a similar or even more relaxed interpretation. Unless UK, Belgium or Austria have other guidelines, it should be safe to apply same standards there as they are all implementing the same ePrivacy directive into national law. Of course every country may interpret it differently. So far I'm not aware of any guideline where a country says "user agent" and other similar information that is transferred to you anyway where you don't actually access any information on the terminal directly does require consent. In this case we're not sending/using any extra information that otherwise would not be sent.

Nonetheless, there could be a privacy feature (or a plugin) to not even look at the user agent itself. The question then becomes what happens to the IP address. Can you look at the anonymised IP or do they also see that as requiring consent? It would effectively then mean that you would need to put every action into a separate, new visit.

Do you know how they stand about the anonymised IP address in the fingerprint being used for building the config_id hash?
And I assume if they want you to ignore the user agent and not touch it, then it would also mean to not store information about which browser or device or operating system someone is using?

jmbiltresse commented 1 year ago

@tsteur: the problem is how countries interpret and apply the ePrivacy directive. It appears, based on our lawyers' analysis, that Belgium, UK and Austria (our contracted lawyers did not look at other countries) are taking a strict approach to the ePrivacy directive considering that accessing information from the user agent, including anonymized IP address, to generate a config_id, would fall under article 5(3): "...gaining of access to information already stored, in the terminal equipment".

I know, last year, to comply with the directive in Germany, you implemented a new disableBrowserFeatureDetection method that excludes any plugins or resolution from the computed config_id. I am wondering if there'd be a way to go one step further by computing a config_id that would not be based on the device attributes, i.e.: IP, browser family, version, operating system, etc.

I am thinking like a unique ID, a server side Session ID we could pass to Matomo or something along this way. I totally get that this would screw up some of the visitors' metrics, but the other metrics should be ok to use (behavior, goals, media, heatmaps, etc.).

Thanks for your consideration.

Jean-Michel

MoritzLost commented 1 year ago

I am thinking like a unique ID, a server side Session ID we could pass to Matomo or something along this way. I totally get that this would screw up some of the visitors' metrics, but the other metrics should be ok to use (behavior, goals, media, heatmaps, etc.).

@jmbiltresse That just sounds like a cookie with extra steps ;) If you create a server-side ID and send it along to Matomo to send back for every request, that's just another tracking ID that would require consent under GDPR alone. I think if you really can't look at anything, not even the data the browser sends along for every request by itself, it just means you can't use tracking without consent. Every action would be a new visit (because you have no way to tell if two visits came from the same user) and you don't really have anything to go by to filter out bots. That's completely useless.

But under the reading of your lawyers, viewing any website would require consent, because the browser sends this data along with the request as well. So we would have to find a way to ask for consent from the user before the initial request to the server – not the Matomo server, but the server for the website …

jmbiltresse commented 1 year ago

@MoritzLost: Interesting thoughts. For the sake of clarity, let's distinguish between GDPR and ePrivacy. Please feel free to challenge me. Matomo is GDPR-compliant because the config_id is hashed (using anonymized IP address), only valid for 24 hours, unique to one domain, i.e. you cannot trace back a user. Matomo is not ePrivacy-compliant in some countries, again per our lawyers' assessment, because to compute the config_id, it accesses information on the end user device without any consent. If we were to pass a session ID to Matomo, and Matomo would hash it, we'd not have a way to trace back the user and this would be compliant to GDPR in my view. It would also be ePrivacy-compliant since the config_id would be computed without accessing any information on the end user device. Thoughts? Am I missing something here? Thanks, really appreciate the discussion on this topic.

MoritzLost commented 1 year ago

If we were to pass a session ID to Matomo, and Matomo would hash it, we'd not have a way to trace back the user and this would be compliant to GDPR in my view.

Under GDPR you don't need to be able to identify a specific natural person – having a unique ID that allows you to uniquely identify a given user is enough to require consent. If you create a unique ID, send it to the user and have the Matomo script send it back with every request, that's essentially a tracking ID. So there isn't much difference between this and any tracking script by an ad provider. I don't think it matter how often you cycle that ID. It's also not much different than creating a client-side cookie and sending it along with every request – it's a random ID that allows you to link visits to a specific visitor or session and therefore allows you to build a profile about them. So it's a pseudonymous identifier that requires consent under GDPR. Hashing is irrelevant here because the ID is only used to associate visits from the same person with each other, which you can do regardless of whether you apply any hashing or not.

In any case, I'm not a lawyer, so it's probably better to ask your lawyers about your proposal. But to be honest, I think that under their interpretation, you won't be able to use client-side tracking without consent. I think you'll either have to go back to requiring consent or just use less accurate server-side tracking.

jmbiltresse commented 1 year ago

@MoritzLost: but how would this be different from the current hashed config_id that is computed using an anonymized IP address, browser family, version and operating system ? It's also an attempt to uniquely identify a user so that Matomo can associate actions/activities to that user.

MoritzLost commented 1 year ago

@jmbiltresse Yeah, but in this case, you're only hashing the data that is sent along with every request anyway, not storing and retrieving an identifier on the user's device. But again, I'm not a lawyer … I think from a technical perspective, we make the mistake of assuming the laws make sense and are internally consistent.

jmbiltresse commented 1 year ago

@tsteur @MoritzLost: to make this flexible and adjustable, it would be great to have the possibility to override the default config_id with our own set of attributes. Each company could then decide, in accordance with their privacy or external lawyers, which of those attributes are GDRP and ePrivacy cleared / compliant. Again, I understand this would likely screw up some of the visitors' metrics. Thanks for the consideration, and I really appreciate the valuable exchanges / discussions here! Jean-Michel.

tsteur commented 1 year ago

Each company could then decide, in accordance with their privacy or external lawyers, which of those attributes are GDRP and ePrivacy cleared / compliant

Indeed. I think there's low chance on discussing the different interpretations and finding a common agreement when there aren't 100% clear guidelines and some interpret it differently. We should give the choice though to make this work where people come to different conclusions.

I want to make sure though I understand things fully.

The idea here is to not only not use the user agent in the config_id but basically not generate any config_id at all (or completely randomise it based on no information)?

What about still using the user agent for the "device / operating system / browser" information? Would we have to completely discard the user agent in that example?

With the IP already being able to be fully anonymised and not included in the config_id, would we basically need a setting to completely ignore the user agent?

jmbiltresse commented 1 year ago

@tsteur You are correct. We would need a setting to completely ignore the user agent to compute the config_id.

This would make Matomo comply with the most restrictive interpretation of the ePrivacy directive. What I had in mind, and I am sure there are plenty of other solutions, was to generate a config_id based on the current date, time (hour) and block of 15 minutes starting from the hour. If a user were to access a website on March 31, 2023, at 10:00, the config_id would look like 03.31.2023.10.1 (mm.dd.yyyy.hh.block of 15 minutes starting from the hour). One could even format it like an IP address (033.120.231.01), append a generic browser name, version and operating system to keep compatibility with the current config_id algorithm.

Of course that would mean that all users accessing the website on the same day, hour and block of 15 minutes would be considered the same user, but it'd already be more accurate than assigning a random number to every page request.

In that spirit, I think we'd need to also discard the use of the user agent for the "device / operating system / browser" information.

Thanks very much

Jean-Michel

tsteur commented 1 year ago

Thanks for this @jmbiltresse I'll bring this up internally.

jmbiltresse commented 1 year ago

@tsteur Thanks very much!

MatomoForumNotifications commented 1 year ago

This issue has been mentioned on Matomo forums. There might be relevant details there:

https://forum.matomo.org/t/disablecookies-and-disablefingerprint/48868/5

MatomoForumNotifications commented 1 year ago

This issue has been mentioned on Matomo forums. There might be relevant details there:

https://forum.matomo.org/t/cookieless-config-id-and-eprivacy/50398/2

tsteur commented 1 year ago

Here we would want to add a new tracker method, similar to disableBrowserFeatureDetection tracker method. When the method is called, it would send a tracking parameter along that would ignore the user agent as part of the "config_id" and also wouldn't detect browser or device from it.

We'll need to document the new tracking method and the new Tracking HTTP API parameter.

We'd also add a new FAQ similar to https://matomo.org/faq/how-to/how-do-i-disable-browser-feature-detection-completely/

We would link to this new FAQ in https://matomo.org/faq/general/configure-privacy-settings-in-matomo/

to be clarified if few other pages need updating too.

matomo-org / matomo

Add an option to disable fingerprinting / config_id entirely #18448

Update - the specs

Summary