Closed baygeldin closed 5 years ago
Each IP in both the CSV and MMDB file will map to exactly one geoname_id, and each of those geoname_id's will be found in the Locations files.
The subdivisions are values mapped to a particular location, not directly mapped to a given IP address. We do not include the geoname_id's of subdivisions in the CSV files by design, only the ISO code and name. If no IP's are mapped directly to a given subdivision, it is expected that the geoname_id for that subdivision will not appear in the Locations CSV files.
The geoname_id's are taken directly from https://www.geonames.org/, so you may be able to look there if there is data that you need which isn't otherwise find the CSV files.
@klp2 Thank you for the clarification! So, if I understand correctly the same logic works for countries (e.g. if we have a country which IP ranges are fully covered by its cities, the country will not be included in the CSV), right?
The subdivisions are values mapped to a particular location
How are the values mapped? Unfortunately, the hierarchy of geo-objects is quite difficult to reproduce from the files that Geonames provide (even the hierarchy.txt
is not complete). Is there any chance that this mapping can be extracted from the MaxmindDB? This will at least show what locations are missing in the CSV files.
All of the countries should show up in the Locations files.
I don't think there is a particularly easy way to extract the mapping out of the MMDB files.
Problem
I've downloaded the latest updates of the Maxmind's GeoLite2 City database (both in MaxMind DB binary and CSV formats). When I tried to look up "88.184.98.0" here's what I got:
However, there's no corresponding
geoname_id
for returned subdivisions in CSV files (e.g.cat GeoLite2-City-Locations-en.csv | fgrep 11071621
returns nothing). This situation is very common for subdivisions (e.g. Novosibirsk Oblast, Scotland, etc). Is it a bug or an expected behavoiur? What is the relation between the CSV files and the MMDB format?Why is it important
For services that employ some kind of targeting or filtering of traffic based on location, the
geoname_id
's are important. For example, if we have an ad serving network and want to allow users to restrict a particular ad to a set of geolocations, it makes sense to describe such a set using respectivegeoname_id
's from the CSV files and compare against it thegeoname_id
's returned by the MMDB format when deciding whether or not to serve the ad (depending on from which location the request came from). However, if ageoname_id
is absent in the CSV files we can't use restrict the ad to the respective location even though the MMDB format returns it when resolving an IP address.Workaround
A workaround is to manually add missing objects to the CSV files using the IDs returned by the MMDB format (although there's a lot of them to add manually), but in this case, a very important question is whether these
geoname_id
's are reliable or are they likely to change in the future?