maxmind / libmaxminddb

C library for the MaxMind DB file format
https://maxmind.github.io/libmaxminddb/
Apache License 2.0
908 stars 236 forks source link

mmdblookup usability #135

Closed fstirlitz closed 5 years ago

fstirlitz commented 7 years ago

I've got several small issues with the mmdblookup command-line utility.

First of all, its output format is type-annotated pseudo-JSON, which on one hand is too verbose to display in a console (for manual use), and on the other cannot be parsed with existing tools. Ideally, I'd like it to be actual parseable JSON, with an option to display something human-readable, like:

IP address         : 108.168.255.243
Continent          : North America [NA]
Registered country : United States [US]
Country            : United States [US]
State              : Texas [TX]
City               : Dallas (75270)
Coordinates        : 96.8°W 32.8°N
Time zone          : America/Chicago

IP address         : 2607:f0d0:3:8::4
Continent          : North America [NA]
Registered country : United States [US]
Country            : United States [US]
Coordinates        : 97.8°W 37.8°N

The utility also follows the Simon Says school of command-line parsing: it requires me each time to type both --ip and --file options. I'd much rather simply type mmdblookup 108.168.255.243; the program should take any 'unprefixed' command-line arguments as IP addresses, and by default use a database from a well-known location in the file system (somewhere in /var/lib). Even better, DNS resolution could be implemented in the utility, so that you could simply run mmdblookup maxmind.com without invoking a separate program to resolve the hostname.

Lastly, error messages from the utility contain gratuitous whitespace. The utility also spews the entirety of the built-in help text when it doesn't like the used command-line syntax; I think a 'Run "mmdblookup --help" for usage' message would suffice here. It's what most other command-line programs do.

I can work around all of these deficiencies by wrapping mmdblookup in a shell script, but... well, why should I have to?

oschwald commented 7 years ago

That sounds like a useful tool, but a different tool than mmdblookup.

The use case for mmdblookup is for developers integrating libmaxminddb or other reader libraries into their applications. This is why it shows the structure of the data record and the types. The JSON-like formatting is intended to show what is a map in the database and what is an array. The type annotations exist so that developers know what type to expect.

Also, libmaxminddb and mmdblookup do not know anything about the GeoIP2/GeoLite2 City databases, which seem to be the database you are interested in. These are lower-level tools intended to work with any database conforming to the MMDB specification.

fstirlitz commented 7 years ago

Hmm. So it's just a debugging tool. At least this makes some of these things understandable, but it's hardly satisfying. And the documentation doesn't quite advertise it as such.

It shouldn't be too hard to write something a little more specialised as a sample app, would it?

szepeviktor commented 6 years ago

This is how one gets the AS number:

mmdblookup --file /var/lib/GeoIP/GeoLite2-ASN.mmdb --ip "$IP" autonomous_system_number | sed -n -e '0,/.*\s\([0-9]\+\)\s.*/s//\1/p'
JayBrown commented 6 years ago

I'm not even sure the default usage works. When I run

mmdblookup --file /usr/local/var/GeoIP/GeoLite2-City.mmdb --ip 209.250.235.170 names en

I get the error: Got an error looking up the entry data - The lookup path does not match the data (key that doesn't exist, array index bigger than the array, expected array or map where none exists)

In the unfiltered output the keys "names" are there, "en" as well. Anyone with more experience?

At any rate, I absolutely second @fstirlitz's idea for HR output and parseable JSON.

szepeviktor commented 6 years ago

Giving a statc path names en may end up referncing a non-existent path. The Python interface it very nice: https://github.com/szepeviktor/debian-server-tools/blob/master/webserver/web-sessions-geoiplookup.py

See the docs!

JayBrown commented 6 years ago

Thank you, though for that I'd also have to install geoip2.database (python), but I just want to make the CLI output work. Shell would probably work. Maybe something like this:

ipaddr="209.250.235.170"
mmdb_raw=$(mmdblookup --file /usr/local/var/GeoIP/GeoLite2-City.mmdb --ip "$ipaddr")
city=$(echo "$mmdb_raw" | grep -A9 "\"city\":" | sed '$!d' | awk -F\" '{print $2}')
country_raw=$(echo "$mmdb_raw" | grep -A13 "\"country\":")
country_name=$(echo "$country_raw" | grep -A1 "\"en\":" | sed '$!d' | awk -F\" '{print $2}')
country_code=$(echo "$country_raw" | grep -A1 "\"iso_code\":" | sed '$!d' | awk -F\" '{print $2}')
latitude=$(echo "$mmdb_raw" | grep -A1 "\"latitude\":" | sed '$!d' | awk -F" <" '{print $1}' | xargs)
longitude=$(echo "$mmdb_raw" | grep -A1 "\"longitude\":" | sed '$!d' | awk -F" <" '{print $1}' | xargs)
echo "City: $city"
echo "Country: $country_name"
echo "ISO code: $country_code"
echo "Coordinates: $latitude $longitude"

Don't know if it'll work with every record. (?)

szepeviktor commented 6 years ago

Don't know if it'll work with every record. (?)

You can loop through all 2^32 IP addresses and see.

(maybe python is more manageable)

macvelli commented 6 years ago

I am having the same issue as @JayBrown where I specify:

mmdblookup --file FILE --ip IP country iso_code

In this case, I get the Got an error looking up the entry data message attempting to extract the Country ISO code from the MMDB record.

If I execute the command without specifying 'iso_code' then I get the following output:

{
  "geoname_id": 
    6252001 <uint32>
  "iso_code": 
    "US" <utf8_string>
  "names": 
    {
      "de": 
        "USA" <utf8_string>
      "en": 
        "United States" <utf8_string>
      "es": 
        "Estados Unidos" <utf8_string>
      "fr": 
        "États-Unis" <utf8_string>
      "ja": 
        "アメリカ合衆国" <utf8_string>
      "pt-BR": 
        "Estados Unidos" <utf8_string>
      "ru": 
        "США" <utf8_string>
      "zh-CN": 
        "美国" <utf8_string>
    }
}

So I am able to narrow down the results to just the "country" JSON record, but I cannot specify anything else contained within that record.

Thoughts?

oschwald commented 6 years ago

@macvelli, the following works fine for me:

mmdblookup -f /usr/local/share/GeoIP/GeoLite2-City.mmdb -i 128.101.101.101 country iso_code

Could you provide the full command you are using, the version of libmaxminddb, and the database version and date?

macvelli commented 6 years ago

Thanks for the feedback @oschwald, I appreciate it. Your example worked for me as well.

I am executing mmdblookup in a Bash script and it turns out the problem has something to do with Bash. The following fails for me when there are more than one [DATA PATH] arguments specified on the command-line:

mmdblookup --file /usr/share/GeoLite2/GeoLite2-City.mmdb --ip $ipAddr "$dataPath"

If I remove the double-quotes around $dataPath, it suddenly works.

knoxcard commented 5 years ago

This is how I hacked it in NodeJS, hehe

module.exports = function(ip, cb) {
    const execSync = require('child_process').execSync
    var IP_REGEXP = /^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$|^(([a-zA-Z]|[a-zA-Z][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z]|[A-Za-z][A-Za-z0-9\-]*[A-Za-z0-9])$|^\s*((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?\s*$/;
    if(!IP_REGEXP.test(ip)) {
        console.log('BAD IP!')
        return cb(false)
    }
    var out = execSync('/usr/bin/mmdblookup --file ' + require('path').join('node_modules/geolite2/dbs/', 'GeoLite2-City.mmdb') + ' --ip ' + ip,  [])
    out = out.toString()
    out = out.replace(/(<([^>]+)>)/ig, ',')   // remove all tags (later - only remove tags that are outside of double quotes!)
    out = out.replace(/\,(?!\s*?[\{\[\"\'\w])/g, '') // remove all trailing commas
    // console.log(out)
    const geo_keys = require('dirty-json').parse(out)
    var city = {}
    city.name = geo_keys.city ? geo_keys.city.names.en : undefined
    city.state = geo_keys.subdivisions ? geo_keys.subdivisions[0].iso_code : undefined
    city.postal = geo_keys.postal ? geo_keys.postal.code : undefined
    city.country = geo_keys.country ? geo_keys.country.iso_code : undefined
    city.latitude = geo_keys.location ? geo_keys.location.latitude : undefined
    city.longitude = geo_keys.location ? geo_keys.location.longitude : undefined    
    // console.log(JSON.stringify(city, null, 4))
    cb(city)
}
szepeviktor commented 5 years ago

Users without nodejs may use sed to restore valid JSON and jq to parse it:

$ mmdblookup --file /var/lib/GeoIP/GeoLite2-ASN.mmdb --ip 1.1.1.1 \
    | sed -e ':a;N;$!ba;s/\n/ /g' -e 's/ <[a-z0-9_]\+>/,/g' -e 's/,\s\+}/}/g' \
    | jq '"AS\(.autonomous_system_number) \(.autonomous_system_organization)"'
"AS13335 Cloudflare, Inc."
knoxcard commented 5 years ago

@szepeviktor - I am trying to get your example to work. I'd rather do it the right way, that dirty-json library is a quick fix for nodeJs.

what am i doing wrong?

mmdblookup --file /etc/nginx/domains/indospace.io/services/node_modules/geolite2/dbs/GeoLite2-City.mmdb --ip 199.7.157.90 sed -e ':a;N;$!ba;s/\n/ /g' -e 's/ <[a-z0-9_]\+>/,/g' -e 's/,\s\+}/}/g' | jq '"AS\(.autonomous_system_number) \(.autonomous_system_organization)"' "AS13335 Cloudflare, Inc."

szepeviktor commented 5 years ago

:-)

"AS13335 Cloudflare, Inc."

is the output!

knoxcard commented 5 years ago

Right, I did figure that.. when I run this:

mmdblookup --file /etc/nginx/domains/indospace.io/services/node_modules/geolite2/dbs/GeoLite2-City.mmdb --ip 199.7.157.90 sed -e ':a;N;$!ba;s/\n/ /g' -e 's/ <[a-z0-9_]\+>/,/g' -e 's/,\s\+}/}/g' | jq '"AS\(.autonomous_system_number) \(.autonomous_system_organization)"'

I get....

mmdblookup: invalid option -- 'e'
mmdblookup: invalid option -- 'e'
mmdblookup: invalid option -- 'e'
parse error: Invalid numeric literal at line 2, column 13
szepeviktor commented 5 years ago

Yes, the pipe sign is missing between 199.7.157.90 sed

Try 199.7.157.90 | sed

knoxcard commented 5 years ago

mmdblookup --file /etc/nginx/domains/indospace.io/services/node_modules/geolite2/dbs/GeoLite2-City.mmdb --ip 199.7.157.90 | sed -e ':a;N;$!ba;s/\n/ /g' -e 's/ <[a-z0-9_]\+>/,/g' -e 's/,\s\+}/}/g' | jq '"AS\(.autonomous_system_number) \(.autonomous_system_organization)

Now I get the following error...

parse error: Expected separator between values at line 1, column 493

szepeviktor commented 5 years ago

Now the ending quotes are missing: jq '"AS\(.autonomous_system_number) \(.autonomous_system_organization)"'

szepeviktor commented 5 years ago

...and these sed expressions are for GeoLite2-ASN.mmdb, not for city database.

knoxcard commented 5 years ago

Here is another NodeJS code I did for everyone's reference, lot slower because I have to run mulitple commands.


    const execSync = require('child_process').execSync
    var city = {}
    try {
        var name = execSync('/usr/local/bin/mmdblookup --file ' + require('path').join('node_modules/geolite2/dbs/', 'GeoLite2-City.mmdb') + ' --ip ' + ip + ' city names en',  [])
        city.name = (name.toString().match(/".*?"/g)[0]).replace(/\"/g, '').trim()
        // console.log('city.name: ' + city.name)
        var state = execSync('/usr/local/bin/mmdblookup --file ' + require('path').join('node_modules/geolite2/dbs/', 'GeoLite2-City.mmdb') + ' --ip ' + ip + ' subdivisions 0 iso_code',  [])
        city.state = (state.toString().match(/".*?"/g)[0]).replace(/\"/g, '').trim()
        // console.log('city.state: ' + city.state)
        var postal = execSync('/usr/local/bin/mmdblookup --file ' + require('path').join('node_modules/geolite2/dbs/', 'GeoLite2-City.mmdb') + ' --ip ' + ip + ' postal code',  [])
        city.postal = parseInt((postal.toString().match(/".*?"/g)[0]).replace(/\"/g, '').trim())
        // console.log('city.postal: ' + city.postal)
        var country = execSync('/usr/local/bin/mmdblookup --file ' + require('path').join('node_modules/geolite2/dbs/', 'GeoLite2-City.mmdb') + ' --ip ' + ip + ' country iso_code',  [])
        city.country = (country.toString().match(/".*?"/g)[0]).replace(/\"/g, '').trim()
        // console.log('city.country: ' + city.country)
        var latitude = execSync('/usr/local/bin/mmdblookup --file ' + require('path').join('node_modules/geolite2/dbs/', 'GeoLite2-City.mmdb') + ' --ip ' + ip + ' location latitude',  [])
        city.latitude = parseFloat((latitude.toString()).replace(/(<([^>]+)>)/ig, '').replace("\n", '').trim())
        // console.log('city.latitude: ' + city.latitude)
        var longitude = execSync('/usr/local/bin/mmdblookup --file ' + require('path').join('node_modules/geolite2/dbs/', 'GeoLite2-City.mmdb') + ' --ip ' + ip + ' location longitude',  [])
        city.longitude = parseFloat((longitude.toString()).replace(/(<([^>]+)>)/ig, '').replace("\n", '').trim())
        // console.log('city.longitude: ' + city.longitude)
        cb(city)
    }
    catch(e) {
        cb(city)
    }
lemmy04 commented 5 years ago

What I'd like to have is something that replicates the functionality of the old geoiplookup and geoiplooku6 commands: pass IP address as the only argument, get information from all available databases in human-readable format.

szepeviktor commented 5 years ago

e.g. https://github.com/szepeviktor/debian-server-tools/blob/master/tools/geoiplookup-as and there is a country lookup script beside it

JayBrown commented 5 years ago

For macOS it would be: mmdblookup --file /usr/local/var/GeoIP/GeoLite2-City.mmdb --ip <IPAddress>

Use GNU sed to parse.

hungerburg commented 5 years ago

Another shell wrapper (above sed foo did not work for me, some commas were missing from output)

#!/bin/sh

if [ -z "$1" ] ; then echo 'No IP-address specified'; exit 1; fi

echo -n "$1 "; mmdblookup -f /var/lib/GeoIP/GeoLite2-City.mmdb -i $1 \
| sed -ne 's/<[^>]*>$// ; s/ *$// ; s/[^:{[]$/&,/ ; N ; s/,\(\n\s*[\x5d}]\)/\1/ ; s/,\n\s*$// ; P ; D' \
| jq -r '[.continent.code, .country.iso_code, .city.names.en, .subdivisions[0].names.en] | @tsv'

PS: sed foo nicked from @fstirlitz ? https://wiki.archlinux.org/index.php/User:Fstirlitz/quick_and_dirty_geoip_lookup_script

mivk commented 5 years ago

Until we have a usable output format from mmdblookup, I use this perl pipe alternative to just get the country name: mmdblookup --file ... --ip ... country names en | perl -nle '/^\s*"(.*?)"\s+</ && print $1' And I actually wrapped it into this little ip2country script:

#! /usr/bin/perl
$file = '/var/lib/GeoIP/GeoLite2-Country.mmdb';
$val  = 'country names en';

$ip   = shift;

open(MMDB, "mmdblookup --file $file --ip $ip $val |") or die;
while (<MMDB>) {
    /^\s*"(.*?)"\s+</ && print "$1\n";
}
9c9a commented 5 years ago

I'd like it to be parseable JSON, wasted a lot of time with it 👎

pbiering commented 5 years ago

advertisment begin "ipv6calc" (http://www.deepspace6.net/projects/ipv6calc.html) since version 2.1.0 has IP to CountryCode implemented as shortcode as replacement to geoiplookup, examples:

ipv6calc -q --addr2cc 85.214.153.25
DE

ipv6calc -q --addr2cc 2a01:238:4281:8600:812a:5915:8d24:58f3
DE

and it can also use DB-IP.com MMDB instead (or as fallback)... advertisment end

pirate commented 5 years ago

As it stands this tool is very unfriendly to the UNIX philosophy of having commands always output a format thats parseable by later commands. When the JSON output isn't valid JSON every user who has to use this tool ends up wasting hours having to hack around it. Please at least add a flag like --json to hide the type annotations and make it parseable without needing to resort to sed/awk.

OP said it eloquently enough already, and that was almost 8 months ago!

I can work around all of these deficiencies by wrapping mmdblookup in a shell script, but... well, why should I have to?

Another note to add aside from the type annotations, why does the query output come wrapped with quotes and two unneeded newlines? It adds another layer of parsing difficulty:

$ mmdblookup --file /usr/local/var/GeoIP/GeoLite2-City.mmdb --ip 75.37.71.246 city names en

  "Oakland" <utf8_string>

Should just be:

$ mmdblookup --file /usr/local/var/GeoIP/GeoLite2-City.mmdb --ip 75.37.71.246 city names en
Oakland

If scripts later on down the line need quotes, they can add it themselves.

This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

(Note text streams, not custom "JSON-like" streams)

oschwald commented 5 years ago

I am going to close this issue. It has become a collection of unrelated feature requests. Some of them may be appropriate enhancements to libmaxminddb, but most of them would be better served by a tool designed for end users. If we provide such a tool, it is likely it would not be based on libmaxminddb or part of this distribution.

pirate commented 5 years ago

Opened a new ticket for two specific feature requests with a smaller scope than this issue: a --json flag, and a --raw-value flag. https://github.com/maxmind/libmaxminddb/issues/212

LouAlbano commented 4 years ago

Hopefully this isn't off topic but wireshark comes with mmdbresolve which has output more like geoiplookup.

PatrickCronin commented 4 years ago

We have recently released mmdbinspect which is a new tool intended to address several of the issues people raised relating to the general usability of mmdblookup. It supports querying multiple databases and multiple IPs with a single incantation, and it outputs pure JSON.

While the intent of mmdblookup was and is to support developers who are integrating libmaxminddb or other reader libraries into their applications, mmdbinspect is intended to be a well-behaved command line program supporting general purpose querying of MMDB databases.

It's in beta. We would welcome you to try it and provide any feedback you may have on its issues page.

szepeviktor commented 4 years ago

The link may need a correction: https://github.com/maxmind/mmdbinspect