matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.88k stars 2.65k forks source link

Identify visits from Tor exit nodes #3284

Open robocoder opened 12 years ago

robocoder commented 12 years ago

From forum http://forum.piwik.org/read.php?2,91715:

function is_Tor_exitnode() {
  if (gethostbyname(ip_reverse($_SERVER['REMOTE_ADDR']).".".$_SERVER['SERVER_PORT'].".".ip_reverse($_SERVER['SERVER_ADDR']).".ip-port.exitlist.torproject.org")=="127.0.0.2") {
    return true;
  } else {
    return false;
  } 
}

function ip_reverse($ip) {
  $ipoctett = explode(".",$ip);
  return $ipoctett[3].".".$ipoctett[2].".".$ipoctett[1].".".$ipoctett[0];
}

Also, if a visit is suspected to be from a Tor node, perhaps Piwik could skip geolocation and provider lookup.

mattab commented 12 years ago

I think it would be a good candidate for a plugin, because adding the Reverse DNS lookup will be too costly to do by default. It's a fun idea to do.

Maybe the plugin could set a Custom Variable to identify the user as Tor? eg. name=Tor User value=IP ?

grote commented 7 years ago

There is a list of Tor exit nodes, so you don't need a reverse DNS lookup. Piwik would just need to check if the visitor IP matches of one of the exit node IPs and then display a little onion logo and skip whatever other checks don't make sense when the user is browsing via Tor.

fvdm commented 7 years ago

That is pretty much how I do it atm:

This could be made into a plugin, but I imagine it could affect performance in a bad way on larger projects processing many hits per second. Reading the whole list each time is not ideal.

For only a isTor() function of course only the IPs need to be saved, but I use the other data too.

jvoisin commented 7 years ago

There is no need to pull the list 4 times a day, the Tor network doesn't move that fast. Once a day, or every 3 days, is enough. Also, please randomize the download time, so piwik instances won't DDoS the service by downloading the file at the same time.

There is no need to read the file each time, there are currently something like 1000 exit nodes, you can just keep this in a hash table in memory.

mattab commented 7 years ago

This could be made into a plugin, but I imagine it could affect performance in a bad way on larger projects processing many hits per second. Reading the whole list each time is not ideal.

Would be great to make this into a plugin @fvdm :+1:

If you store the list in a PHP array in a file, it would be very fast. But even better, you can use our Cache mechanism. It's very fast too. See the docs https://github.com/piwik/component-cache and examples in many places in the code, eg. https://github.com/piwik/piwik/blob/3.x-dev/plugins/SitesManager/SiteUrls.php#L133-L146

SimonVillage commented 7 years ago

Does someone already has a plugin for this?