maxmind / GeoIP2-php

PHP API for GeoIP2 webservice client and database reader
https://maxmind.github.io/GeoIP2-php/
Apache License 2.0
2.33k stars 276 forks source link

Slow lookups #61

Closed SimonDeconde closed 8 years ago

SimonDeconde commented 8 years ago

Hi there

I've been comparing this library to the legacy geoip-api-php (https://github.com/maxmind/geoip-api-php) in terms of lookup speed. I've read that the new format (.mmdb) is supposed to offer faster lookups than the old format (.dat), however my tests show the opposite, with way slower lookups.

Consider the below code:

use GeoIp2\Database\Reader;

function ip_lookup_geoip($ip) {
  $geoip_old_reader = geoip_open(realpath('.').'/sites/all/libraries/GeoLiteCity.dat', GEOIP_STANDARD);
  $record = geoip_record_by_addr($geoip_old_reader, $ip);
  geoip_close($geoip_old_reader);
  return $record;
}

function ip_lookup_geoip2($ip) {
  $geoip_reader = new Reader(realpath('.').'/sites/all/libraries/GeoLite2-City.mmdb');

  try {
    $record = $geoip_reader->city($ip);
    return $record;
  } catch (Exception $e) {
    print('An error happened while looking up IP !ip: '.$e->getMessage());
  }

  return FALSE;
}

$ip = '173.194.115.23';

$start = microtime(TRUE);
$record = ip_lookup_geoip($ip);
$elapsed = microtime(TRUE) - $start;
print("[geoip] Lookup time: $elapsed\r\n");
print("[geoip] $ip: {$record->country_name} > {$record->city}\r\n");

$start = microtime(TRUE);
$record = ip_lookup_geoip2($ip);
$elapsed = microtime(TRUE) - $start;
print("[geoip2] Lookup time: $elapsed\r\n");
print("[geoip2] $ip: {$record->country->name} > {$record->city->name}\r\n");

Execution trace:

[geoip] Lookup time: 0.0038161277770996
[geoip] 173.194.115.23: United States > Mountain View
[geoip2] Lookup time: 0.14523100852966
[geoip2] 173.194.115.23: United States > Mountain View

Am I missing something here?

oschwald commented 8 years ago

I wouldn't say you are missing anything, but there are a couple of ways to improve performance:

SimonDeconde commented 8 years ago

Hi Oschwald

Thanks for your reply.

My example code was written for simplicity, but even without opening/closing the database for every lookup the performance difference can be seen. Based on both 1) a more advanced benchmark (2000 ip lookup loops) and 2) a real application where I import visitor geolocation (millions of lookups), I'm still seeing a 3x slower lookup time than with the legacy library (DB is opened/closed only once).

Based on your experience, is that normal that this library is slower than the legacy one? If so, is there a specific technical reason?

I've indeed heard about the C extension, but I wanted to compare the two libraries/formats based on pure-PHP solutions. Would you say that this new library + C extension is faster than the old library + C extension?

Thanks for your help!

oschwald commented 8 years ago

Yes, it is expected that the new format will be slower. The new database contains quite a bit more information that gets read out of the database during the request and the search tree is deeper as the new format supports IPv6 in the same database as IPv4. Although we did not provide a C extension ourselves for the old format, I would suspect that the PECL one for the old format is also faster than the official extension for the new format.

SimonDeconde commented 8 years ago

Thanks for the details Gregory!