matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.64k stars 2.62k forks source link

Masking of IPv6 #18301

Open dennisbaaten opened 2 years ago

dennisbaaten commented 2 years ago

The FAQ describes the way masking is done for IPv6 addresses:

IP Anonymisation privacy feature will anonymise IPv6 addresses. For example if Matomo is configured to anonymise “1 byte” from an IPv4, then the IPv6 address 2001:db8:0:8d3:0:8a2e:70:7344 would be anonymised as 2001:db8:0:8d3:0:0:0:0. When configured to anonymise 2 bytes, then the IPv6 becomes 2001:db8:0:0:0:0:0:0. And when configured to anonymise 3 bytes, then the IPv6 would be 2001:d00:0:0:0:0:0:0.

First of all, I find the example IPv6 address a bit confusing since the full /128 version has two "0" quartets in it.

When looking at the masking options, the FAQ seems to suggest that the IPv6 masking options are:

At the same time the source code

        $masks = array(
            'ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff',
            'ffff:ffff:ffff:ffff::',
            'ffff:ffff:ffff:0000::',
            'ffff:ff00:0000:0000::',
            '0000::'
        );

seems to imply that the masking options are:

Last but not least, I notice that most ISP's give /48 to their customers, which means that a person can be uniquely identified by these 3 first quartets. So all IPv6 masking options of 80 bits and less are not sufficient for anonymization, potentially causing problems with GDPR compliance. Because 104 bits masking (or more) would hinder location analytics too much, it seems a good idea to include a masking option of, let's say, 96 bits.

tsteur commented 2 years ago

Thanks for creating this issue @dennisbaaten . I can see eg for the Matomo server IP 2a00:b6e0:0001:0200:0177:0000:0000:0001 and double checked that this results in 2a00:b6e0:1:200::. Geolocation etc works still well. It wouldn't identify an individual though if I see this right. Not sure though how all the ISP's handle it. Also looked into some personal IPs and it shouldn't identify an individual there maybe. That's only the examples I looked at though.

We generally likely can't change the logic for all users as it could change a few metrics/reports.

Including another option may be possible but not sure how that would be best done and explained. It would need like an option 2.5 but then it would be expected this be also applied to IPv4. This makes you then think that maybe IPv4 and IPv6 should be configured separately but then it makes the UI quite complicated.

image

On the other side this could be developed in a custom plugin with a few lines of code maybe.

daniel-lerch commented 2 years ago

From a user's perspective I would not consider two separate settings confusing. On the contrary, before reading this issue, I had no idea whether Matomo does mask IPv6 addresses at all. In my opinion two separate settings for IPv4 and IPv6 would improve clarity and give users more control about which data they collect.

sgiehl commented 2 years ago

@tsteur we should maybe at least consider to show in the UI how IPv6 addresses are masked, so it's clear that the setting affects both IPv4 and IPv6. Should be easy to simply add some examples like we have for IPv4...

tsteur commented 2 years ago

@sgiehl that sounds good. We can tweak the inline help for example and make sure to mention that this anonymises IPv4 and IPv6.

dennisbaaten commented 2 years ago

Any progress on this issue?

justinvelluppillai commented 2 years ago

This issue is not yet prioritised to be worked on. I will add it to the milestone for our product team to consider.

enual commented 2 weeks ago

Adding comment as another Matomo user is looking for this feature.

randy-innocraft commented 1 week ago

Hi @dennisbaaten, Thank you for bringing this to our attention and for your valuable input. Your suggestion seems like a valuable enhancement to our product. We will forward this to our Product team for review and future consideration. If you have any additional details or questions, please feel free to share them here.