traviscross / mtr

Official repository for mtr, a network diagnostic tool
http://www.bitwizard.nl/mtr/
GNU General Public License v2.0
2.7k stars 341 forks source link

MaxMind DB for Location #492

Open JDarzan opened 1 year ago

JDarzan commented 1 year ago

Hello everyone,

I've been considering the possibility of enhancing the results with more detailed insights. It might be beneficial to integrate the MaxMind's MMDB to gain accurate information on IP locations and ASN details. Something, for instance, for us to use as a parameter --mmdb GeoIP2-City.mmdb

Such an integration could offer a more detailed and enriched view of the hops, especially for those wishing to see the location in real-time. Is there any ongoing discussion or plan regarding this?

Thank you in advance for considering this suggestion, and I look forward to community feedback.

Hope that helps with your communication with the repository!

rewolff commented 1 year ago

I am open to patches that implement this.

I think your approach of specifying the database to use is the right approach.

On the other hand, having a default that works and just a flag to enable it might be better. Then I can use MTR_OPTIONS environment variable to set my default geoip provider and then enable it with a simple flag instead of having to type my provider everytime I want to use it.

yvs2014 commented 1 year ago

Just btw, does GeoIP2 have online API? (If so, are there some restrictions like number of queries, etc.) Comparing to (idk, for example, to IP-API) is it provided more data?

p.s. some info from there for example ``` mtr-0.85 -fa -y5,2,3,5,6,12,13 yahoo.com: in pause Keys: hints quit rhome: Sat Oct 7 18:33:14 2023 Packets Pings CC RC City Zip AS Name Host Loss Snt Last Avg Best Wrst StDev 15. US NY Lockport 14095 AS26101 Oath Holdings Inc. media-router-fp73.prod.media.vip. 0% 283 115 115 114 118 0.3 ```
JDarzan commented 1 year ago

@rewolff Thank you for your response. I am running some tests and should be releasing a patch and will notify you about it.

Indeed, your approach on how to capture IP information by setting it in the MTR_OPTIONS environment variable provides more autonomy over the queries.

We only have one issue: due to the different data architecture from each provider, I need to think about a better approach for these situations, such as searching locally in the mmdb or externally through an API.

JDarzan commented 1 year ago

@yvs2014 Yhap... there are limitations even for paid accounts! ipinfo.io, MaxMind (GeoIP2), IP2Location, ipstack, ip-api and others

For exemple with my account on ipinfo.io: 500k lookups $60 per additional 50K lookups

Result of API Business plan have more details:

{
  "ip": "40.107.218.61",
  "city": "San Antonio",
  "region": "Texas",
  "country": "US",
  "loc": "29.4241,-98.4936",
  "postal": "78205",
  "timezone": "America/Chicago",
  "asn": {
    "asn": "AS8075",
    "name": "Microsoft Corporation",
    "domain": "microsoft.com",
    "route": "40.104.0.0/14",
    "type": "business"
  },
  "company": {
    "name": "Microsoft Corporation",
    "domain": "microsoft.com",
    "type": "business"
  },
  "privacy": {
    "vpn": false,
    "proxy": false,
    "tor": false,
    "relay": false,
    "hosting": false,
    "service": ""
  },
  "abuse": {
    "address": "US, WA, Redmond, One Microsoft Way, 98052",
    "country": "US",
    "email": "abuse@microsoft.com",
    "name": "Microsoft Abuse Contact",
    "network": "40.74.0.0-40.125.127.255",
    "phone": "+1-425-882-8080"
  },
  "domains": {
    "total": 0,
    "domains": []
  }
}
JDarzan commented 1 year ago

@yvs2014

A good approach would be for you to maintain a local SQLite database and before querying ipinfo (maybe), you first check the IP in your SQLite database and store the result from ipinfo (maybe) in this database, keeping it updated.

So, in case the IP is repeated, you won't need to make a new request

yvs2014 commented 1 year ago

@JDarzan − there's not so many queries in an ordinary trace − in case of using dns api, those replies are usualy cached by local dns server in many cases it's enough

rewolff commented 1 year ago

sqlite is "low overhead". Just a library putting things in a file with a way to access them as if it were a database. One probem with sqlite is that you need to specify an expected maximum number of entries in the database beforehand. It will handle more, no problem, but it will become slow. Maybe in our case not a problem. but for E2FSCK that WAS a problem (I fixed the problem before the original fsck would have finished. Fixed it still took around 24h...). For "normal" people say 100 hosts in the database might be enough. Not being too wasteful, you'd initialize it to 1000. But then someone doing a wide scan will run into the not O(1) but O(N^2) problems of exceeding the initial "max items" estimate....

I'm not impressed by the caching of DNS servers. I get the impression that this often doesn't work somehow. (not working includes reporting "nope, no such host" while it is still waiting for a reply from the other end, and when the reply does come in it retries the whole request forwarding and again reports "nope" before the answer comes in. Stuff like that.).

For MTR I think adding another dependency is the most important part. I don't like it.

Having a fallback if sqlite is not available sounds like an option to me:

// pseudocode
void add_stuff_to_database (ipaddr, data)
{
    fp=  fopen (thedatabasefile, APPEND); 
   fprintf (fp, "%s  %s", format_ip (ipaddr), data); 
   close (fp); 
}
char *get_data_from_database (ipaddr, data)
{ 
  fp=  fopen (thedatabasefile, APPEND); 
  while (fgets (fp, buf, 1024) != NULL) {
     sscanf (buf, "%s %s" , ipaddr2, data2); 
     if (ipaddr == ipaddr2) {
        strcpy (data, data2); 
         fclose (fp);
        return data;
  }
  fclose (fp);
  return NULL;
}

Something like this as a fallback if sqlite is not available shouldn't take more than about twice the number of lines here to get it to work.

For the "limited number of queries" cases, preparing for the "huge scan" kind of application would be good, Suppose someone is going to do 100k scans with 10 hops on average. A normal user doing 10 scans of 10 hops is not a problem. But the 1M lookups with more than 90% doubles would cost serious money otherwise.

yvs2014 commented 1 year ago

sqlite is "low overhead". Just a library putting things in a file ...

Anyway it needs some external databases working with persistent storage. On other hand, keeping a couple dozens entries in memory (gotten from free online source) is usually enough to an ordinary trace (it's my use case).

A normal user doing 10 scans of 10 hops is not a problem. But the 1M lookups with more than 90% doubles would cost serious money otherwise.

i.e. it's not free and not for common use

Jamie-Landeg-Jones commented 5 months ago

I use the maxmind api in other projects, and have been meaning to modify mtr to use it too.

However, you appear to be discussing a realtime dns lookup method. I'm using the downloadable version of the database, which is already stored in it's own DB format.

The API to read that is quite simple, and exists in many languages already: https://maxmind.github.io/MaxMind-DB/

Is anyone considering this method of providing location data within mtr, and/or is there any interest in such a patch if I can finally get somewhat through my "TODO" list?

ps. I use the free version. It's refreshed monthly and is not quite as current as the paid-for version, but I've never noticed any issues with this