Open HansMeiser234 opened 5 years ago
Hi, geolite legacy database was not made for utf-8 so clients may or may not use it correctly. The fact the ü is displayed as two chars makes me things that it's utf-8 encoded in the dat file but it is the client screwing the output, because it does not decoded it as utf-8.
What version of python are you using? 2 or 3?
for example pygeoip is not able to handle utf-8 data, with python3 (or python2 with a working utf-8 locale) you can use a trick to correctly pick the utf-8 string (I've tried to play with the internals instead but not easy to fix):
import pygeoip
m = pygeoip.GeoIP('x.dat')
city = m.record_by_addr('91.38.193.110')['city']
city = city.encode(pygeoip.ENCODING).decode('utf-8') # iso-8859-1
print(city)
result:
Füssen
Hello,
thanks for your answer.
but it is the client screwing the output, because it does not decoded it as utf-8. i think that too, /usr/bin/geoiplookup is part of bundled geoip-bin and may expect data in iso-8859-1 What version of python are you using? 2 or 3? this is python2. In python3 (3.6.7) is no modul ipadrr, the alternative is called ipaddress. https://pypi.org/project/ipaddr/ this provides other functions then ipaddr, so i think current geolite2legacy.py can not used with python3.
If i explicitely use -e iso-8859-1 for encoding i receive a lot of errors like this Warning cannot encode u'Hachi\u014dji' using iso-8859-1 I took a closer look to data in csv file. I see a lot of city-names/regions (for example japan Hachiōji) which use utf8 encoded chars, which i think are not convertable to iso. this may be the reason why you use utf8 as default encoding. is this a change in new db format? did they exclude such cities in former version of legacy-db? Unfortunately i dont know specific IPs to test former output of geoiplookup.
Thanks, Hans
Sorry, inline commenting failed in above text. also my text is marked as comment. Hans
Hello,
while testing i discovered an other issue. I miss a lot city data for japanese cities. for example i test with 117.55.223.153 an get result with old db GeoIP Country Edition: JP, Japan GeoIP City Edition, Rev 1: JP, 19, Kanagawa, Kawasaki, 210-0835, 35.520599, 139.717194, 0, 0 GeoIP ASNum Edition: AS10021 KVH Co.,Ltd
The converted version shows: GeoIP Country Edition: JP, Japan GeoIP City Edition, Rev 1: JP, 00, N/A, N/A, N/A, 35.689999, 139.690002, 0, 0 GeoIP ASNum Edition: AS10021 KVH Co.,Ltd
In zipped csv File GeoLite2-City-Locations-de.csv or GeoLite2-City-Locations-en.csv i successfully find these cities. is it possbile that there was a loss in conversion? I do it this way: geolite2legacy.py -i "GeoLite2-City-CSV.zip -f geoname2fips.csv -o GeoLiteCity.dat
Thanks, Hans
Gianluigi , still alive? I thought you are interested in these things?
are you sure? with that ip I get:
{
"area_code": 0,
"city": "Toshima",
"continent": "AS",
"country_code": "JP",
"country_code3": "JPN",
"country_name": "Japan",
"dma_code": 0,
"latitude": 35.72630000000001,
"longitude": 139.6859,
"metro_code": null,
"postal_code": "171-0052",
"region_code": "00",
"time_zone": "Asia/Tokyo"
}
Did you change something? I downloaded again latest geolite2legacy.py and again db data. now i get: GeoIP City Edition, Rev 1: JP, 00, N/A, Toshima, 171-0052, 35.726299, 139.685898, 0, 0
python3 using GeoLite2-City-CSV_20190610.zip
Check my patch to convert names to plain ascii with unidecode, been using it for a while.
https://github.com/sherpya/geolite2legacy/pull/21
$ ./geolite2legacy.py -i GeoLite2-City-CSV.zip -o GeoIPCity.dat $ geoiplookup -f GeoIPCity.dat 178.17.166.99 GeoIP City Edition, Rev 1: MD, 00, N/A, FÃ ÂleÃÂti, MD-5901, 47.573601, 27.709200, 0, 0
$ ./geolite2legacy.py -i GeoLite2-City-CSV.zip -e latin-1 -o GeoIPCity_latin1.dat $ geoiplookup -f GeoIPCity_latin1.dat 178.17.166.99 GeoIP City Edition, Rev 1: MD, 00, N/A, F?le?ti, MD-5901, 47.573601, 27.709200, 0, 0
$ ./geolite2legacy.py -i GeoLite2-City-CSV.zip -e latin-1 -u -o GeoIPCity_latin1_unidecode.dat $ geoiplookup -f GeoIPCity_latin1_unidecode.dat 178.17.166.99 GeoIP City Edition, Rev 1: MD, 00, N/A, Falesti, MD-5901, 47.573601, 27.709200, 0, 0
Hello,
for the sake of completeness i have to tell i could not test any more. 2 years ago a changed my company and have complete different workthemes. I forwarded this 2 years ago to old mates and i think they still use it. Just in case you wonder about missing comments ;)
Hans
Hello,
thanks for your converter, i want to use it, but currently i have an issue with encoding. I test this IP: 91.38.193.110 City is called Füssen (german umlaut) When using the converted db on console geoiplookup shows city as Füssen. In my utf-8 putty i would expect to see a correct umlaut when using this encoding. Whats wrong here? https://geolite.maxmind.com/download/geoip/database/GeoLite2-City-CSV.zip geolite2legacy.py -i GeoLite2-City-CSV.zip -f geoname2fips.csv -e utf-8 -o GeoLiteCity.dat In csv file itself the umlaut seems to be correct, i can see a gorgeous ü when grepping in GeoLite2-City-Locations-de.csv
What do you think?
Thanks, Hans