matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.94k stars 2.66k forks source link

Include GeoIP in core after improvements #1823

Closed mattab closed 11 years ago

mattab commented 14 years ago

See doc: Geo Locate visitors countries cities and regions.

GeoIP plugin #45 is one of the most popular plugins. For a web analytics tool, getting user countries as accurately as possible is critical, and Piwik should help users in this direction.

When the plugin is released in trunk, we should update the FAQ, website pages and wiki pages mentionning GeoIP, and mark as closed the GeoIP ticket #45. For Goals compatibility of GeoIP plugin, see #1434.

Please let us know in the comments your feedback. If you would like to participate... well you know what to do!

gka commented 12 years ago

Attachment: Patch of GeoIP.php that allows it to store region ids. GeoIP.php

oparoz commented 12 years ago

Attachment: Database updater via cron geoip.updater.sh

gka commented 14 years ago

Just some input to clarify the terms "country" and "region". Refering to the list of administrative levels used in OpenStreetMaps a country would correspond to admin level 2 while regions would correspond to admin level 4.

gka commented 14 years ago

Do we record lat/long for each visitor, or do we assume that other systems will know where to plot a given City

I think it is NOT necessary to record lat/long for each visitor. It is sufficient to record the city id. The GeoIP db would resolve each visitors location within the same city to the same lat/long anyway. In fact for each city there is only one lat/long stored in the GeoLite City DB (more precisely in the cityByCountry table).

As the number of available cities (= pairs of lat/long) differs between the different GeoIP databases, it makes no sense to put these information into other systems like the world map.

anonymous-matomo-user commented 14 years ago

if I might add to this. The maxmind db gives a city lookup. This does not work how people think it will. Blocks of IP numbers are sold to to service providers who resell to end users. However, the IP issuing authority assign the city of the ISP address to all the IP numbers. At least that is how it works in the UK. Things may vary in different countries and ISPs don't reallocate city when they sell dedicated IP numbers to end users. The result is that city lookup generally only gives the city of the ISP and not where the visitor is visting from. The ISP can be anywhere in the country and hundreds of miles from where the visitor is based. In other words, city lookup is useless except for giving the location of ISPs. This also means that lat long is useless too since it seems to be based on city lookup. When IPV6 is rolled out and if, and only if, ISPs allocate city to users when they purchase a fixed IP then city lookup may become useful. But many ISPs still use dynamically allocated IPs so it wouldn't work in that case either. In short the concept of providing city and/or lat/long of vistors is fundamentlly flawed.

oparoz commented 13 years ago

+1 for this, especially the Apache module detection routine. I get a few fatal errors in my logs because the plugin insists on loading the local files instead of getting the data from Apache.

anonymous-matomo-user commented 13 years ago

I have the commercial db of Maxmind. You can use it if you want for developing the new plugin. Let me know how I can contact you.

mattab commented 13 years ago

See a bug report in the forum in the php script to update past visits: http://forum.piwik.org/read.php?2,71587,page=2#msg-71784

and fix: http://forum.piwik.org/read.php?2,70989,page=1#msg-92434

robocoder commented 13 years ago

I'll take this on, in conjunction with the ipv6 ticket.

mattab commented 13 years ago
anonymous-matomo-user commented 13 years ago

Great idea.

robocoder commented 13 years ago

In the existing GeoIP plugin, there's a misc/.htaccess file. We don't want this in the new plugin. Access to geoipUpdateRows.php (or equivalent) should be guarded via token_auth.

anonymous-matomo-user commented 13 years ago

Should I delete the .htaccess file in there?

robocoder commented 13 years ago

Yes, you can remove the .htaccess file. After you've run it once, you shouldn't have to run it again.

anonymous-matomo-user commented 13 years ago

$ php geoipUpdateRows.php

Fatal error: Call to undefined function _parse_ini_file() in /home/kiplingw/avaneya.com/piwik/core/Config.php on line 373

???

I also removed the .htaccess file.

robocoder commented 13 years ago

This should be fixed in the updated .zip that I attached to #45. Heres the patch so you don't have to redownload the .zip:

Index: geoipUpdateRows.php
===================================================================
--- geoipUpdateRows.php (revision 51)
+++ geoipUpdateRows.php (working copy)
@@ -20,8 +20,8 @@
        . PATH_SEPARATOR . PIWIK_INCLUDE_PATH . '/libs'
        . PATH_SEPARATOR . PIWIK_INCLUDE_PATH . '/plugins');

+require_once PIWIK_INCLUDE_PATH . '/libs/upgradephp/upgrade.php';
 require_once PIWIK_INCLUDE_PATH . '/core/testMinimumPhpVersion.php';
-
 require_once PIWIK_INCLUDE_PATH . '/core/Loader.php';

 $GLOBALS['PIWIK_TRACKER_DEBUG'] = false;
anonymous-matomo-user commented 13 years ago

Thanks. Applied. How can I test it?

anonymous-matomo-user commented 13 years ago

I ran

$ php geoipUpdateRows.php

It finished execution (no output), and I noticed the UserCountry_ thing is still there in the stats. Should I just ignore that for now and assume new stats will not have that?

robocoder commented 13 years ago

yes

anonymous-matomo-user commented 13 years ago

Thank you =)

mattab commented 13 years ago

To answer questions in the ticket:

robocoder commented 13 years ago

Replying to matt:

  • this means, that we don't store lat/long in log_visit

I'm thinking of keeping lat/long because:

robocoder commented 13 years ago

re: comment:13

I would like to propose:

mattab commented 13 years ago

I would like to propose:

  • rolling the provider plugin into the geolocation plugin
  • if the geolocation plugin can get the organization field, it populate location_provider
  • otherwise, fallback to the gethostbyaddr() method

great idea!

The only thing, is please make sure the few "Provider" special cases are still working. In particular, VisitorGenerator & proxy-piwik.php disable the Provider lookup because it is too slow

robocoder commented 13 years ago

Ok. There are a couple of third party plugins (e.g., KSVisitorImport and TrackerSecondaryDb) that also disable the Provider plugin.

mattab commented 13 years ago

as a note, these plugins will be obsolete once we implement #134

anonymous-matomo-user commented 13 years ago

+1 vote for adding regions onto this as well. They are available in GeoLiteCity, so might as well use them. It would be great to include this into a regional map as well that the country map can drill down into.

anonymous-matomo-user commented 13 years ago

Copy of my coment to #5465 (sorry, I used wrong ticket, apparantly, I knew there was one specifically for integration of GeoIP into core):

This new plugin sounds promising. But I hope you are going to also keep the old browser language/country detection, maybe named as such. I personally consider that language display equally important as the IP location display.

Following scenario: I'm on a travel around the world, and have a travel blog. People accessing that blog are often people I have met on the trip, often still traveling. Now, when I see my Piwik logs, the IP location (which I currently check manually) is surely interesting, but what tells me more about a visitor is actually his browser language. If you check the IP address I am writing this from, you will see that it is Malaysian. How much do I have to do with Malaysia? Nothing. My browser language is German of Germany, which tells more. And the combination of the two IP location and browser country (i.e. the current detection) actually provides one more detail: the visitor is most likely a traveler or an expat. I can imagine website who interested in that marketing information.

You would not believe how many travelers roam the world this days. And I would say most of them use the often free WiFi (at their place of stay, bars and restaurants all over Southeast Asia) with their own devices: Laptops, Phones, Tablets, etc. It seems to be the new way of travel, with people sticking their noses into displays half of their time, with most of that time on Facebook.

P.S.: Since there are countries with several languages (Belguim, etc.), but also countries with common language (UK, US, etc.), maybe both, the browser country and its language could be shown (if provided by browser). Additionally to the IP location provided by this plugin.

robocoder commented 13 years ago

jawsmith: #638

anonymous-matomo-user commented 13 years ago

+1 on jawsmith proposal on having a combined vision on location against visitor's preferred language. As a belgian developer I can tell you that this kind of information can be of crucial interest in a country like Belgium, but in many others too. For example, usage of the spanish language in some regions of the US can be an important factor I think...

I imagine an ideal "Vistor countries" GeoIP plugin offering the current "Countries" split, clicking a country name would open a "Regions" list, clicking a region name would open (the currently available) "Cities" list. Then an additional button could be fit at the bottom, between the "Display simple table" and "Display a table with more metrics" that would "Display a table with languages". That table could have one additional column for each language that was detected...

anonymous-matomo-user commented 13 years ago

Replying to vipsoft:

jawsmith: #638

Thank you very much for the info on the browser language detection plugin! That just leaves the browser country detection, in case the GeoIP plugin replaces it in core. (E.g.: Is it a British or an American accessing my website from the Philippines?)

gka commented 13 years ago

Do we recording regions as well as Countries? Do we record Cities?

As the new world map widget will be able to display data for regions and cities, it would be amazing if Piwik would be able to record the data for regions and cities :)

Do we record lat/long for each visitor, or do we assume that other systems (eg. the world map) will know where to plot a given City (and maintain their own database)?

Nope, the world map doesn't store locations for every city. Instead, it will be able to plot any given lat/long onto the map.

mattab commented 13 years ago

Here is proposal of the API functions and returned data for the GeoIP integration in core:

Note:

mattab commented 13 years ago

Replying to vipsoft:

Replying to matt:

  • this means, that we don't store lat/long in log_visit

I'm thinking of keeping lat/long because:

  • future support for HTML5 navigator.geolocation HTML5 Geolocation is probably never going to be used by Piwik since it requires the user to opt-in to share the location with the domain name.
  • in #1652, greg says the map plots by lat/long; including it in log_visit avoids a city-to-latlong lookup

I am reluctant to include redundant information in the log_visit table.

At minimum, we should record in log_visit

Thoughts?

gka commented 13 years ago

The question remains if we need to store lat/long, depending how fast/easy it is to query lat/long from a given City using GeoIP (maybe this is not possible?)

Depends on what kind of database you're using. If you're using the CSV database and import it to MySQL tables, than you can run a query like

SELECT latitude,longitude FROM location WHERE city = 'Berlin'

in < 1ms. However, you will get ambiguous results when just looking for city names. Instead, a better idea would be to store the unique GeoIP location-id.

I don't know if any of the GeoIP APIs that work with the binary database (.dat) supports reverse-queries. All I saw was the IP --> location way..

gka commented 13 years ago

Note: I propose to remove "Continent" and process this from the aggregated Country datatable in the Archiving function. It would be trivial/fast to process the Top continents.

+1, since 'classic' Continents are also quite useless for many scenarios. Often, people are more interested in political/economic regions, e.g. MENA

anonymous-matomo-user commented 12 years ago

Replying to tlitody:

However, the IP issuing authority assign the city of the ISP address to all the IP numbers. At least that is how it works in the UK. Things may vary in different countries and ISPs don't reallocate city when they sell dedicated IP numbers to end users.

In Germany you haven't this Problem since AOL doesn't exist in Germany anymore. You can locate the City. In rural Areas the difference between the real location and the indicated Area can be 55km... This is my experience.

mattab commented 12 years ago

(In [5775]) Refs #2902

Refs #1823

mattab commented 12 years ago

When Anonymize IP is enabled with only 1 byte removed, could we default the last byte to 1 so that we get at least an approximate User location? See also: #3023

gka commented 12 years ago

In fact, in most cases that's the same level of accuracy as if you would use all 4 bytes..

mattab commented 12 years ago

Sounds good, we will most likely do this then. This will limit user frustration significantly since there has been many complaints that "Provider" reports is not working at all when IP anonymized (it would be even worse if GeoIP was broken!)

robocoder commented 12 years ago

Reasonable assumption as long as the IP belongs to a class C address (or larger). It also depends on the quality of the geolocation data provider.

mattab commented 12 years ago

Thank you guys for your feedback

oparoz commented 12 years ago

For some reason, the plugin behaves differently when called via the log import script and we get a fatal error.

PHP Fatal error: Cannot redeclare geoip_country_code_by_name() in /plugins/GeoIP/libs/geoip.inc on line 347

Checking if those functions have already been declared doesn't help as it seems the whole geoip.inc file shouldn't be called.

Env PHP 5.4 Piwik 1.8.2 mod_geoIP in Apache geoIP pecl extension in PHP

robocoder commented 12 years ago

Interfasys: your php-cli has the geoip extension enabled which has the same api as the php library used by the GeoIP plugin (#45).

This conflict will be addressed by the new Geolocation plugin.

mattab commented 12 years ago

many users are discussing patches to the GeoIP files in: http://forum.piwik.org/read.php?2,71788

for each person posting in the forum there are probably 10 users having the same issue

it shows the very high interest of the community in having an integrated geoIP plugin in core :)

gka commented 12 years ago

Btw, here's my patch of the GeoIP plugin (just GeoIP.php in this case). It enables the plugin to store region information, which is essential for the map widget I develop.

robocoder commented 12 years ago

(In [6545]) refs #1823 - commit geolocation adapters and plugin stubs

mattab commented 12 years ago

Thanks Anthon for the initial commit!!

There is quite some work left on this task:

If anyone is keen to help, please let me know ASAP!! :)

anonymous-matomo-user commented 12 years ago

Thanks for all the hard work. Integration would confirm Piwik as a superior alternative to Google Analytics. I've posted this in the forum but will post here as well. While replacing the provider details with the organization details adds a ton of value to the reports, occasionally the listed organization will be the same as the ISP. This detracts value from the organization report and thus it would be nice to be able to filter out a list of ISPs using a single segment/parameter. The single segment/parameter would also allow for continual updating of the list.

I can't help on the coding side, but if there is any other way to help, please don't hesitate.