z------------- / nim-mmdb

MaxMind DB file reader in pure Nim.
MIT License
5 stars 0 forks source link

Optimize performance #1

Open ajusa opened 2 years ago

ajusa commented 2 years ago

Hello,

Thank you so much for making this library! It's the only one I could find to do IP lookups. One thing that I did notice is that performance could probably be better - here's my minimal example:

Version using your library:

import mmdb
let countryDb = initMMDB("./GeoLite2-Country.mmdb")
let asnDb = initMMDB("./GeoLite2-ASN.mmdb")

for ip in stdin.lines:
  try:
    var country = countryDb.lookup(ip)
    let asn = asnDb.lookup(ip)
    let output = "AS" & $asn["autonomous_system_number"] & "\t" & $country["country"]["iso_code"]
  except:
    discard

Version using nimpy to call the python library that the MaxMind folks provide.

import nimpy
let database = pyImport("geoip2.database")
let countryDb = database.Reader("GeoLite2-Country.mmdb")
let asnDb = database.Reader("GeoLite2-ASN.mmdb")

for ip in stdin.lines:
  try:
    var isoCode = countryDb.country(ip).iso_code.to(string)
    var asn = asnDb.asn(ip).autonomous_system_number.to(int)
    let output = "AS" & $asn & "\t" & isoCode
  except:
    discard

I found these two to run at approximately the same speed (7 seconds for both on my machine). Ideally, the Nim version should be considerably faster. I did take a stab at profiling this library but wasn't able to figure out where the slowdown is.

ajusa commented 2 years ago

Using a string stream that is allocated entirely in memory does help a bit here (I see about a 30% improvement) which makes it faster than this Python version.

z------------- commented 2 years ago

Hi; glad that somebody is finding this library useful. I haven't done any profiling but I would expect that both implementations spend a lot of time doing I/O, and that lines up with your observation about using an in-memory string stream. One of these days I'll profile it and see if there's anything I can do.