palinkas-jo-reggelt / GeoLite2SQL

Import MaxMinds GeoLite2 data into MySQL
4 stars 6 forks source link

IPv6 #4

Closed RvdHout closed 2 years ago

RvdHout commented 2 years ago

Hi palinka,

Would it be possible to also include IPv6 addresses in the import? (for now i am only using the GeoLite2-Country database)

RvdHout commented 2 years ago

Maybe similar as hmailserver by itself stores IPv6 addresses, eg spliced up into 2 bigint values, see table hm_securityranges eg:

rangelowerip1 bigint(20) NOT NULL, IPv4 or IPv6 rangelowerip2 bigint(20) DEFAULT NULL, only filled with IPv6 rangeupperip1 bigint(20) NOT NULL, IPv4 or IPv6 rangeupperip2 bigint(20) DEFAULT NULL, only filled with IPv6

palinkas-jo-reggelt commented 2 years ago

I really don't know squat about IPv6. I assume it can be stored as an integer AND the maxmind converter can make that conversion (IPv6 > int) the same as it does with IPv4. If those assumptions are true, then why could IPv4 and IPv6 not reside in the same table?

Converting and importing should be as simple as duplicating the code with minor adjustments.

What I don't understand is how to call the integer values. With IPv4, the MySQL query contains INET_ATON('1.2.3.4') - is it the same for IPv6 (INET_ATON('2001:256::whatever')? Is there overlap among IPv4 and IPv6 integer values?

RvdHout commented 2 years ago

INET_ATON IPv4 INET6_ATON IPv6

IPv6 doesn't fit in a bigint(20) field, so either you splice it up as described above or you store both IPv4 and IPv6 in varbinary(16)

Given an IPv6 or IPv4 network address as a string, returns a binary string that represents the numeric value of the address in network byte order (big endian). Because numeric-format IPv6 addresses require more bytes than the largest integer type, the representation returned by this function has the VARBINARY data type: VARBINARY(16) for IPv6 addresses and VARBINARY(4) for IPv4 addresses. If the argument is not a valid address, INET6_ATON() returns NULL.

palinkas-jo-reggelt commented 2 years ago

So then you really need 2 tables. I don't see any good reason to combine both into a single table.

The query should be determined at the script level: if ipv6 then query = "SELECT blah blah FROM IPv6-Table etc etc" and the same for ipv4. Correct?

RvdHout commented 2 years ago

it can fit in the same table i think, network_start_integer, network_last_integer should be VARBINARY(16) though

Sub GeoIPLookup(ByVal sIPAddress, ByRef m_CountryCode, ByRef m_CountryName)
    Dim oRecord, oConn : Set oConn = CreateObject("ADODB.Connection")
    oConn.Open "Driver={MySQL ODBC 5.3 Unicode Driver}; Server=localhost; Database=geoip; User=geoip; Password=supersecretpassword;"

    If oConn.State <> 1 Then
        EventLog.Write( "Sub GeoIPLookup - ERROR: Could not connect to database" )
        m_CountryCode = "XX"
        m_CountryName = "ERROR"
        Exit Sub
    End If

    m_CountryCode = "NX"
    m_CountryName = "NOT FOUND"

    Dim strRegEx, strSQL
    strRegEx = "^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$"
    If Lookup(strRegEx, sIPAddress) Then
        strSQL = "SELECT country_code, country_name FROM (SELECT * FROM geocountry WHERE INET_ATON('" & sIPAddress & "') <= network_last_integer LIMIT 1) AS a INNER JOIN geolocations AS b on a.geoname_id = b.geoname_id WHERE network_start_integer <= INET_ATON('" & sIPAddress & "') LIMIT 1;"
    Else
        strSQL = "SELECT country_code, country_name FROM (SELECT * FROM geocountry WHERE INET6_ATON('" & sIPAddress & "') <= network_last_integer LIMIT 1) AS a INNER JOIN geolocations AS b on a.geoname_id = b.geoname_id WHERE network_start_integer <= INET6_ATON('" & sIPAddress & "') LIMIT 1;"
    End If
    Set oRecord = oConn.Execute(strSQL)
    Do Until oRecord.EOF
        m_CountryCode = oRecord("country_code")
        m_CountryName = oRecord("country_name")
        oRecord.MoveNext
    Loop
    oConn.Close
    Set oRecord = Nothing
    Set oConn = Nothing
End Sub
palinkas-jo-reggelt commented 2 years ago
  Dim strRegEx, strSQL
  strRegEx = "^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$"
  If Lookup(strRegEx, sIPAddress) Then
      strSQL = "SELECT country_code, country_name FROM (SELECT * FROM geocountry WHERE INET_ATON('" & sIPAddress & "') <= network_last_integer LIMIT 1) AS a INNER JOIN geolocations AS b on a.geoname_id = b.geoname_id WHERE network_start_integer <= INET_ATON('" & sIPAddress & "') LIMIT 1;"
  Else
      strSQL = "SELECT country_code, country_name FROM (SELECT * FROM geocountry WHERE INET6_ATON('" & sIPAddress & "') <= ipv6_network_last_integer LIMIT 1) AS a INNER JOIN geolocations AS b on a.geoname_id = b.geoname_id WHERE ipv6_network_start_integer <= INET6_ATON('" & sIPAddress & "') LIMIT 1;"
  End If

IPv6 would have its own network start/end columns because they can't be stored in the bigint. The table structure would be as follows:

DROP TABLE IF EXISTS geocountry;
CREATE TABLE geocountry (
    network_start_integer BIGINT,
    network_last_integer BIGINT,
    ipv6_network_start_integer varbinary(16),
    ipv6_network_last_integer varbinary(16),
    geoname_id BIGINT,
    registered_country_geoname_id BIGINT,
    represented_country_geoname_id BIGINT,
    is_anonymous_proxy TINYINT,
    is_satellite_provider TINYINT,
    KEY geoname_id (geoname_id),
    KEY network_start_integer (network_start_integer),
    PRIMARY KEY network_last_integer (network_last_integer)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
RvdHout commented 2 years ago

You cannot simply change the bigint into varbinary(16)?

both IPv4 and IPv6 will fit into varbinary(16)

palinkas-jo-reggelt commented 2 years ago

Are they compatible? I literally have no idea.

RvdHout commented 2 years ago

I don't know who told you to store it as a bigint in the first place, varbinary is preferred to store ip adressess

palinkas-jo-reggelt commented 2 years ago

Probably someone on Stackoverflow. :)

RvdHout commented 2 years ago

Are they compatible? That is a good one, i think so....and i think it should work without creating separate columns

palinkas-jo-reggelt commented 2 years ago

OK Then it should be as simple as changing the create statement and duplicating the conversion code. Looks pretty simple.

RvdHout commented 2 years ago

i you need someone to test you know where to find me :)

palinkas-jo-reggelt commented 2 years ago

Yes. I have your IP and a table to look up your address. ;)

RvdHout commented 2 years ago

i ran a quick test in hm_ids using VARBINARY(16) for ipaddressfield, both IPv4 and IPv6 are stored correctly

palinkas-jo-reggelt commented 2 years ago

I ran into some problems. Maybe you can help.

Problem # 1:

a) IPv4 and IPv6 have to be created separately. Running one after the other with the same file name does not append, it overwrites. b) In order to use MySQLImport.exe, the FILENAME of the CSV becomes the table name. Therefore, IPv4 and IPv6 integer-range (converted) CSVs must be combined. I do that using powershell Export-CSV, however.... c) Export-CSV ALWAYS encloses fields with quotation marks which we don't want. d) When importing the combined IP data CSV using MySQLImport.exe, I'm running into an issue in powershell where I cannot properly define --fields-enclosed-by because of "standard" quote within quote issues. But I'm stuck and I can't figure it out.

However, running the same command from CMD window (with all variables converted back to strings), it works fine.

CMD:

C:\xampp\mysql\bin\mysqlimport.exe --local --verbose --ignore-lines=1 --fields-enclosed-by=\" --lines-terminated-by=\r\n --fields-terminated-by=, --host=localhost --port=3306 --user=geoip --password=supersecretpassword geoip C:\scripts\GeoLite2SQL\GeoLite2-Country-CSV\GeoCountry2.csv

This works except for one issue (below).

Problem # 2:

VARBINARY(16) apparently is too small for some of the IPv6 data because many records are skipped. I tried increasing the length to VARBINARY(40) and all the records get imported. However, I don't know if that works for anything (I don't know anything about binary data - you'll have to tell me if its good or bad).

Problem # 3:

Even after getting all of the data into the db using VARBINARY(40), running a test query results in 0 records returned.

SELECT country_code, country_name 
FROM (
    SELECT * 
    FROM geocity 
    WHERE INET_ATON('212.186.81.105') <= network_last_integer
    LIMIT 1
) AS a 
INNER JOIN geocitylocations AS b on a.geoname_id = b.geoname_id
WHERE network_start_integer <= INET_ATON('212.186.81.105')
LIMIT 1;

Obviously there are multiple failures here. I need help especially with the VARBINARY issue.

RvdHout commented 2 years ago

https://dev.maxmind.com/geoip/importing-databases/mysql?lang=en

That strange, maybe it should be imported as hex? On page above they also use varbinary(16)

palinkas-jo-reggelt commented 2 years ago

2 things:

1) my test query had the wrong table! Duh!!!

2) I changed VARBINARY to NUMERIC(40,0). This works for corrected IPv4 query but not for IPv6 query.

palinkas-jo-reggelt commented 2 years ago

https://dev.maxmind.com/geoip/importing-databases/mysql?lang=en

That strange, maybe it should be imported as hex? On page above they also use varbinary(16)

network_start = unhex(@network_start), network_end = unhex(@network_end),

Going to try that.

RvdHout commented 2 years ago

I got it to import both as well, a bit hacky but it works but i get same weird results as you....i really don't know where to go from here

It looked so simple to add IPv6 at first sight....

palinkas-jo-reggelt commented 2 years ago

I think all we have to do is follow the instructions. So I'll move from mysqlimport.exe to load data infile query. Shouldn't take long. but I still have that nasty enclosed by quotation mark issue.

RvdHout commented 2 years ago

I now used -include-hex-range instead of -include-integer-range in the conversion, guess what, all of the sudden i am from Canada :) (Ipv4)

palinkas-jo-reggelt commented 2 years ago

Better than being from nowhere, I suppose.

palinkas-jo-reggelt commented 2 years ago

Still having trouble getting powershell to import. Works fine in phpmyadmin.

4/16/2022 2:45:04 PM : [ERROR] DATABASE ERROR : Unable to run query :
                LOAD DATA INFILE 'C:\scripts\GeoLite2SQL\GeoLite2-Country-CSV\GeoCountryIPv4.csv'
                INTO TABLE geocountry2
                FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' IGNORE 1 ROWS
                (@network_start_integer, @network_last_integer, @geoname_id, @registered_country_geoname_id, @represented_country_geoname_id, @is_anonymous_proxy, @is_satellite_provider)
                SET
                        network_start_integer = unhex(@network_start_integer),
                        network_last_integer = unhex(@network_last_integer),
                        geoname_id = nullif(@geoname_id, ''),
                        registered_country_geoname_id = nullif(@registered_country_geoname_id, ''),
                        represented_country_geoname_id = nullif(@represented_country_geoname_id, ''),
                        is_anonymous_proxy = nullif(@is_anonymous_proxy, ''),
                        is_satellite_provider = nullif(@is_satellite_provider, '')
         Exception calling "Fill" with "2" argument(s): "Fatal error encountered during command execution."
RvdHout commented 2 years ago

Something weird, below returns nothing as you said

SELECT country_code, country_name 
FROM (
    SELECT * 
    FROM geocountry2
    WHERE INET_ATON('212.186.81.105') <= network_last_integer
    LIMIT 1
) AS a 
INNER JOIN geocitylocations AS b on a.geoname_id = b.geoname_id
WHERE network_start_integer <= INET_ATON('212.186.81.105')
LIMIT 1;

but this works??? (even for IPv4)

SELECT country_code, country_name 
FROM (
    SELECT * 
    FROM geocountry2
    WHERE INET6_ATON('212.186.81.105') <= network_last_integer
    LIMIT 1
) AS a 
INNER JOIN geocitylocations AS b on a.geoname_id = b.geoname_id
WHERE network_start_integer <= INET6_ATON('212.186.81.105')
LIMIT 1;
RvdHout commented 2 years ago

with VARBINARY(16) we are supposed to store and retrieve all the ipaddresses with INET6_ATON and INET6_NTOA only

Not expected, but tested with hmids as well and there it also functions correctly with INET6_ATON and INET6_NTOA only

5380   "2022-04-16 23:29:19.840"   "SELECT country_code, country_name FROM (SELECT * FROM geocountry2 WHERE INET6_ATON('2a02:a45f:1079:1:b496:59bc:2cd8:5b25') <= network_end LIMIT 1) AS a INNER JOIN geolocations AS b on a.geoname_id = b.geoname_id WHERE network_start <= INET6_ATON('2a02:a45f:1079:1:b496:59bc:2cd8:5b25') LIMIT 1;"
5380   "2022-04-16 23:29:19.840"   "2a02:a45f:1079:1:b496:59bc:2cd8:5b25 - NL - Netherlands"
5372   "2022-04-16 23:29:43.777"   "SELECT country_code, country_name FROM (SELECT * FROM geocountry2 WHERE INET6_ATON('62.131.46.221') <= network_end LIMIT 1) AS a INNER JOIN geolocations AS b on a.geoname_id = b.geoname_id WHERE network_start <= INET6_ATON('62.131.46.221') LIMIT 1;"
5372   "2022-04-16 23:29:43.777"   "62.131.46.221 - NL - Netherlands"
Sub GeoIPLookup(ByVal sIPAddress, ByRef m_CountryCode, ByRef m_CountryName)
    Dim oRecord, oConn : Set oConn = CreateObject("ADODB.Connection")
    oConn.Open "Driver={MySQL ODBC 5.3 Unicode Driver}; Server=localhost; Database=geoip; User=geoip; Password=supersecretpassword;"

    If oConn.State <> 1 Then
        EventLog.Write( "Sub GeoIPLookup - ERROR: Could not connect to database" )
        m_CountryCode = "XX"
        m_CountryName = "ERROR"
        Exit Sub
    End If

    m_CountryCode = "NX"
    m_CountryName = "NOT FOUND"

    Dim strSQL
    strSQL = "SELECT country_code, country_name FROM (SELECT * FROM geocountry2 WHERE INET6_ATON('" & sIPAddress & "') <= network_end LIMIT 1) AS a INNER JOIN geolocations AS b on a.geoname_id = b.geoname_id WHERE network_start <= INET6_ATON('" & sIPAddress & "') LIMIT 1;"
    Set oRecord = oConn.Execute(strSQL)
    Do Until oRecord.EOF
        m_CountryCode = oRecord("country_code")
        m_CountryName = oRecord("country_name")
        oRecord.MoveNext
    Loop
    oConn.Close
    Set oRecord = Nothing
    Set oConn = Nothing
End Sub
palinkas-jo-reggelt commented 2 years ago

I can't get this to work. The issue is I can't figure out what the problem is with the LOAD DATA INFILE query

    $ImportIPv4Query = "
        LOAD DATA INFILE '$CountryBlocksConvertedIPv4'
        INTO TABLE geocountry2
        FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' IGNORE 1 ROWS
        (@network_start_integer, @network_last_integer, @geoname_id, @registered_country_geoname_id, @represented_country_geoname_id, @is_anonymous_proxy, @is_satellite_provider)
        SET 
            network_start_integer = UNHEX(@network_start_integer),
            network_last_integer = UNHEX(@network_last_integer),
            geoname_id = nullif(@geoname_id, ''),
            registered_country_geoname_id = nullif(@registered_country_geoname_id, ''),
            represented_country_geoname_id = nullif(@represented_country_geoname_id, ''),
            is_anonymous_proxy = nullif(@is_anonymous_proxy, ''),
            is_satellite_provider = nullif(@is_satellite_provider, '')
    "
    MySQLQuery $ImportIPv4Query

This results in unknown errors (debug very poor - just says fatal error). I added Allow User Variables=True; to the connection string per some internet advice and that sort of works, but it gives me errors for access denied, user geoip using password yes.

So I went back to mysqlimport and combined the files using get-content | out-file. That works, but since I can't unhex the IP strings, I get gibberish.

Clearly, fixing the query is the right way to go. As I mentioned, the exact same query in phpmyadmin works perfectly. I don't know why I'm getting errors with powershell.

RvdHout commented 2 years ago

I also imported both Ipv4 and Ipv6 blocks manually using -include-hex-range CVS converted files using heidiSQL Sorry don't know if i can assist you with PowerShell, not really my specialty

But guess we are on the right track.... the official docs say:

fields terminated by ',' enclosed by '"' lines terminated by '\n' ignore 1 rows

Does that fix anything?

palinkas-jo-reggelt commented 2 years ago

the official docs say:

fields terminated by ',' enclosed by '"' lines terminated by '\n' ignore 1 rows

Does that fix anything?

The maxmind converter does not enclose fields with quotation marks, so I removed that portion.

Also, when I was using export-csv to combine ipv4/6 data, quotation enclosures are auto added by powershell, and I had no luck escaping " whether it was '" or '"' or '`"' or '`"'. They all failed. However, from command prompt using mysqlimport, \" worked fine.

RvdHout commented 2 years ago

\"?

The official docs reference the converter to create the files, eg:

geoip2-csv-converter -block-file GeoIP2-City-Blocks-IPv4.csv -include-hex-range -output-file GeoIP2-City-Blocks-IPv4-Hex.csv
geoip2-csv-converter -block-file GeoIP2-City-Blocks-IPv6.csv -include-hex-range -output-file GeoIP2-City-Blocks-IPv6-Hex.csv

And yet they included enclosed by '"' fields terminated by ',' enclosed by '"' lines terminated by '\n' ignore 1 rows

So my guess that really needs to be in there

palinkas-jo-reggelt commented 2 years ago

Markdown kludge. Yes, \" worked in cmd prompt.

Re powershell, I could easily import the records properly using "foreach" but that would take forever.

palinkas-jo-reggelt commented 2 years ago

\"?

The official docs reference the converter to create the files, eg:

 ./geoip2-csv-converter -block-file GeoIP2-City-Blocks-IPv4.csv -include-hex-range -output-file GeoIP2-City-Blocks-IPv4-Hex.csv
$ ./geoip2-csv-converter -block-file GeoIP2-City-Blocks-IPv6.csv -include-hex-range -output-file GeoIP2-City-Blocks-IPv6-Hex.csv

And yet the included enclosed by '"' fields terminated by ',' enclosed by '"' lines terminated by '\n' ignore 1 rows

Like I said, the query works fine without "enclosed by" in phpmyadmin. Its something specific to powershell or the dot net connector.

RvdHout commented 2 years ago

True, but manual import also works with fields terminated by ',' enclosed by '"' lines terminated by '\n' ignore 1 rows Don't think they put it in for no reason

palinkas-jo-reggelt commented 2 years ago

Yes. Mistake... properties of an old version... someone was drinking when they wrote the manual... Who knows. Its definitely not required. (99.9% sure)

palinkas-jo-reggelt commented 2 years ago

OK, making progress. I got the powershell query to work by double-slashing the file name C:\\scripts\\GeoLite2SQL\\GeoLite2-Country-CSV\\GeoCountryIPv4.csv AND by switching db user to ROOT. Not sure why this table doesn't have permission to LOAD DATA INFILE, but whatever.

Now there's another issue. For some reason, the record count is not matching. The IPv4 CSV has 378,754 lines but the database after import only contains 377,967 records. I'm not sure how to search the CSV for errors (too many lines for visual inspection).

palinkas-jo-reggelt commented 2 years ago

Success! I don't know what changed with the count mismatch, but its now working properly. The entire thing is working properly (for country, not city data). Both IPv4+6 are working. Total time for the script to run including downloading, etc is 39 seconds. Still needs some cleanup. I'll post the update later.

RvdHout commented 2 years ago

OK, making progress. I got the powershell query to work by double-slashing the file name C:\\scripts\\GeoLite2SQL\\GeoLite2-Country-CSV\\GeoCountryIPv4.csv AND by switching db user to ROOT. Not sure why this table doesn't have permission to LOAD DATA INFILE, but whatever.

Now there's another issue. For some reason, the record count is not matching. The IPv4 CSV has 378,754 lines but the database after import only contains 377,967 records. I'm not sure how to search the CSV for errors (too many lines for visual inspection).

grant file on *.* to 'geoip'@'localhost' ?

RvdHout commented 2 years ago

Just a suggestion... Maybe rename:

network_start_integer = UNHEX(@network_start_integer), network_last_integer = UNHEX(@network_last_integer),

as in the official docs

network_start = UNHEX(@network_start), network_end = UNHEX(@network_end),

As it are no longer integers when using VARBINARY(16)

Maybe you should separate the version with IPv6 support from the IPv4 only version, as the functions in the EventHandlers.vbs also differ from current scripts and snippets provided (& not everyone needs IPv6 support)

palinkas-jo-reggelt commented 2 years ago

Just a suggestion... Maybe rename:

network_start_integer = UNHEX(@network_start_integer), network_last_integer = UNHEX(@network_last_integer),

as in the official docs

network_start = UNHEX(@network_start), network_end = UNHEX(@network_end),

As it are no longer integers when using VARBINARY(16)

Maybe you should separate the version with IPv6 support from the IPv4 only version, as the functions in the EventHandlers.vbs also differ from current scripts and snippets provided (& not everyone needs IPv6 support)

I was thinking about that, but then I would have to change all my scripts that use it. :) But yes, it is more intuitive that way.

palinkas-jo-reggelt commented 2 years ago

Maybe you should separate the version with IPv6 support from the IPv4 only version, as the functions in the EventHandlers.vbs also differ from current scripts and snippets provided (& not everyone needs IPv6 support)

These are wholesale changes. This is a bona fide new version. I'll make the readme clear about that. That goes for hmailserver script as well.

palinkas-jo-reggelt commented 2 years ago

I just committed Version 3 which includes IPv6 and IPv4 combined. Have a look.

RvdHout commented 2 years ago

MariaDB 10.4.24, MySQL Connector 6.10.9

:::  Backup Routine zondag 17 april 2022 :::

17-4-2022 23:46:02 : GeoIP Country Update Start
17-4-2022 23:46:02 : ----------------------------
17-4-2022 23:46:02 : Checking for script update at GitHub
17-4-2022 23:46:02 : Backup & Upload script is latest version: 3.00
17-4-2022 23:46:02 : ----------------------------
17-4-2022 23:46:02 : Deleting old files
17-4-2022 23:46:02 : Folder C:\Tools\GeoLite2SQL\Script-Created-Files\GeoLite2-Country-CSV successfully deleted
17-4-2022 23:46:02 : ----------------------------
17-4-2022 23:46:02 : Downloading MaxMind data
17-4-2022 23:46:09 : MaxMind data successfully downloaded in 6 seconds
17-4-2022 23:46:10 : MaxMind data successfully unzipped in 1 second
17-4-2022 23:46:10 : Locations CSV successfully renamed
17-4-2022 23:46:10 : ----------------------------
17-4-2022 23:46:10 : Counting database records for comparison
17-4-2022 23:46:11 :  database records prior to starting update
17-4-2022 23:46:11 : ----------------------------
17-4-2022 23:46:11 : Counting CSV records for comparison
17-4-2022 23:46:17 : Counted 648.292 records in new IPv4 & IPv6 CSVs in 5 seconds
17-4-2022 23:46:17 : ----------------------------
17-4-2022 23:46:17 : Converting CSV
17-4-2022 23:46:19 : Country IPv4 CSV successfully converted to hex-range in 2 seconds
17-4-2022 23:46:21 : Country IPv6 CSV successfully converted to hex-range in 2 seconds
17-4-2022 23:46:21 : ----------------------------
17-4-2022 23:46:21 : Drop and recreate database tables
17-4-2022 23:46:21 : Database tables successfully dropped and created
17-4-2022 23:46:21 : ----------------------------
17-4-2022 23:46:21 : Import country IP information
17-4-2022 23:46:21 : [ERROR] DATABASE ERROR : Unable to run query : 
            LOAD DATA INFILE 'C:\\Tools\\GeoLite2SQL\\Script-Created-Files\\GeoLite2-Country-CSV\\GeoCountryIPv4.csv'
            INTO TABLE geocountry
            FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' IGNORE 1 ROWS
            (@network_start_hex, @network_last_hex, @geoname_id, @registered_country_geoname_id, @represented_country_geoname_id, @is_anonymous_proxy, @is_satellite_provider)
            SET 
                network_start = UNHEX(@network_start_hex),
                network_last = UNHEX(@network_last_hex),
                geoname_id = @geoname_id,
                registered_country_geoname_id = @registered_country_geoname_id,
                represented_country_geoname_id = @represented_country_geoname_id,
                is_anonymous_proxy = @is_anonymous_proxy,
                is_satellite_provider = @is_satellite_provider;

Exception calling "Fill" with "2" argument(s): "Incorrect integer value: '' for column `geoip`.`geocountry`.`represented_country_geoname_id` at row 1"

17-4-2022 23:46:21 : [OK] Country IPv4 data imported in less than 1 second
17-4-2022 23:46:21 : [ERROR] DATABASE ERROR : Unable to run query : 
            LOAD DATA INFILE 'C:\\Tools\\GeoLite2SQL\\Script-Created-Files\\GeoLite2-Country-CSV\\GeoCountryIPv6.csv'
            INTO TABLE geocountry
            FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' IGNORE 1 ROWS
            (@network_start_hex, @network_last_hex, @geoname_id, @registered_country_geoname_id, @represented_country_geoname_id, @is_anonymous_proxy, @is_satellite_provider)
            SET 
                network_start = UNHEX(@network_start_hex),
                network_last = UNHEX(@network_last_hex),
                geoname_id = @geoname_id,
                registered_country_geoname_id = @registered_country_geoname_id,
                represented_country_geoname_id = @represented_country_geoname_id,
                is_anonymous_proxy = @is_anonymous_proxy,
                is_satellite_provider = @is_satellite_provider;

Exception calling "Fill" with "2" argument(s): "Incorrect integer value: '' for column `geoip`.`geocountry`.`registered_country_geoname_id` at row 1"

17-4-2022 23:46:21 : [OK] Country IPv6 data imported in less than 1 second
17-4-2022 23:46:21 : [ERROR] DATABASE ERROR : Unable to run query : 
            LOAD DATA INFILE 'C:\\Tools\\GeoLite2SQL\\Script-Created-Files\\GeoLite2-Country-CSV\\GeoLocations.csv'
            INTO TABLE countrylocations
            FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' IGNORE 1 ROWS
            (@geoname_id, @locale_code, @continent_code, @continent_name, @country_iso_code, @country_name, @is_in_european_union)
            SET
                geoname_id = @geoname_id, 
                locale_code = @locale_code, 
                continent_code = @continent_code, 
                continent_name = @continent_name, 
                country_code = @country_iso_code, 
                country_name = @country_name, 
                is_in_european_union = @is_in_european_union;

Exception calling "Fill" with "2" argument(s): "Incorrect integer value: ' Sint Eustatius' for column `geoip`.`countrylocations`.`is_in_european_union` at row 251"

17-4-2022 23:46:22 : [OK] Country name data imported in less than 1 second
17-4-2022 23:46:22 : ----------------------------
17-4-2022 23:46:22 : [ERROR] record count mismatch:
17-4-2022 23:46:22 :  records imported to database
17-4-2022 23:46:22 : 648.292 records in MaxMind CSV
17-4-2022 23:46:22 : Completed update in 19 seconds
17-4-2022 23:46:22 : GeoIP update finished
palinkas-jo-reggelt commented 2 years ago

Interesting.

17-4-2022 23:46:11 : database records prior to starting update

It should read "No database records to count" if you never ran it before, or 0 if the table exists with no rows. I'll look into that.

"Incorrect integer value: '' for column `geoip`.`geocountry`.`represented_country_geoname_id` at row 1"

On my system (MariaDB + mysql-connector-net-8.0.25) it converts null values to 0. I guess this is what the NULLIF function was for in the MaxMind example.

"Incorrect integer value: ' Sint Eustatius' for column `geoip`.`countrylocations`.`is_in_european_union` at row 251"

This is caused by a comma in the data field. I'll figure out how to escape that and post an update in a few minutes.

RvdHout commented 2 years ago

yep i inserted all the nullifs and the first part imports OK now

RvdHout commented 2 years ago

"Incorrect integer value: ' Sint Eustatius' for column `geoip`.`countrylocations`.`is_in_european_union` at row 251"

This is caused by a comma in the data field. I'll figure out how to escape that and post an update in a few minutes.

Got it 🤣 FIELDS TERMINATED BY ',' ENCLOSED BY '""' LINES TERMINATED BY '\n' IGNORE 1 ROWS

palinkas-jo-reggelt commented 2 years ago

Very strange. I just tested with and without ENCLOSED BY '`"'.

With: works fine Without: skips import of that record. No error message.

MariaDB 10.4.19

RvdHout commented 2 years ago

Using other local maybe?

palinkas-jo-reggelt commented 2 years ago

Other local? What is that?

RvdHout commented 2 years ago

language of the import, locale_code en/de etc

once i changed this it imported fine

        $ImportLocQuery = "
            LOAD DATA INFILE '$strFileLocName'
            INTO TABLE countrylocations
            FIELDS TERMINATED BY ',' ENCLOSED BY '""' LINES TERMINATED BY '\n' IGNORE 1 ROWS
            (@geoname_id, @locale_code, @continent_code, @continent_name, @country_iso_code, @country_name, @is_in_european_union)
            SET
                geoname_id = @geoname_id, 
                locale_code = @locale_code, 
                continent_code = @continent_code, 
                continent_name = @continent_name, 
                country_code = nullif(@country_iso_code, ''),
                country_name = nullif(@country_name, ''),
                is_in_european_union = nullif(@is_in_european_union, '');
        "
palinkas-jo-reggelt commented 2 years ago

The EN location CSV has the same comma.

That line was skipped instead of throwing an error.