matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.82k stars 2.64k forks source link

How do we ensure Latitude and Longitude columns are at best to the city level #15969

Open mattab opened 4 years ago

mattab commented 4 years ago

With regards to privacy, considering we have a column latitude and longitude in the database schema, how could we (or how do we) ensure Latitude and Longitude columns are at best to the city level?

It is a privacy concern that lat/long could be more precise than what might be expected. In upcoming recommendations it will be important to limit geolocation to the city level at best. afaik we use lat/long in order to plot the user on the real-time map. independantly of whether the user would be geo-located using anonymised IP or not, it'd be great to ensure the lat/long are never too precise.

Is this already the case in Matomo? If not, could we limit lat/long precision to the city (and how)?

mattab commented 4 years ago

Also it'd be important to document this "feature" in the user guide at: https://matomo.org/docs/geo-locate/

diosmosis commented 4 years ago

Possible solutions:

Findus23 commented 4 years ago

See also https://github.com/matomo-org/matomo/issues/12735 for an even rougher rounding.

tsteur commented 4 years ago

I just checked and both DB-IP and MaxMind seem to report the last three digits as 000 and basically round. This can change though in the future.

Also I'm thinking the rounding can still be a problem for rural areas where only few people live. You could then potentially maybe still identify individuals or households maybe?

I'm not sure we can generally find a solution to this besides optionally not tracking it at all (which breaks real time map only). If I see this correct only the real time map uses this info. Maybe the real time map could be changed to work like the regular visitor map and not use long/lat?

diosmosis commented 4 years ago

I'm not sure, but it looks like the visitor map converts a city to a lat/long pair, so this might be do-able pretty easily... still checking though

diosmosis commented 4 years ago

@tsteur nvm, that uses the tracked longitude/latitude. Probably easiest is to somehow map locations to longitude/latitude, otherwise I think we'd have to change the realtime map significantly. It's probably fairly simple to write a script to iterate over every location in the geoip database and set a lat/long in a file.

mattab commented 4 years ago

I'd say that for their own reasons, it's always in the geolocation DB providers interest to not provide more accurate lat/long.

geoip says for example https://www.maxmind.com/en/geoip2-city

Longitude (Latitude and Longitude are often near the center of population. These values are not precise and should not be used to identify a particular address or household.)

As a possible fix maybe we could always set the last 3 digits to zero if that's what maxmind does (in case they change it in the future)?

tsteur commented 4 years ago

I reckon in this case for now we maybe don't need to do anything and if someone wants to use some more accurate provider then they can do this.

The problem would still remain with rounding etc if locations where only few people live but I suppose they would also be maybe assigned to a bigger nearby city (would need to be checked).