rabuchaim / geoip2fast

GeoIP2Fast is the fastest GeoIP2 country/city/asn lookup library. A search takes less than 0.00003 seconds. It has its own data file updated twice a week with Maxmind-Geolite2-CSV and is Pure Python!
https://pypi.org/project/geoip2fast/
MIT License
34 stars 3 forks source link

Fails silently on non existent data file #7

Closed oskar456 closed 7 months ago

oskar456 commented 7 months ago

What I try to achieve

Load the bundled data file geoip2fast-asn-ipv6.dat.gz

What I expect to happen

Data file will get loaded or, if it is not found, an exception is thrown.

What actually happens

If the data file is not found, the default data file is loaded instead and no exception is thrown.

Steps to reproduce

>>> from geoip2fast import GeoIP2Fast
>>> G = GeoIP2Fast(geoip2fast_data_file="geoip2fast-asn-ipv6.dat.gz", verbose=True)
GeoIP2Fast v1.1.8 is ready! geoip2fast.dat.gz loaded with 458848 networks in 0.03968 seconds and using 0.00 MiB.
>>> G = GeoIP2Fast(geoip2fast_data_file="blah blah", verbose=True)
GeoIP2Fast v1.1.8 is ready! geoip2fast.dat.gz loaded with 458848 networks in 0.04265 seconds and using 0.00 MiB.
>>> G = GeoIP2Fast(geoip2fast_data_file="geoip2fast/lib/python3.11/site-packages/geoip2fast/geoip2fast-asn-ipv6.dat.gz", verbose=True) 
GeoIP2Fast v1.1.8 is ready! geoip2fast-asn-ipv6.dat.gz loaded with 754921 networks in 0.10827 seconds and using 0.00 MiB.

Other remarks

It would be nice to be able to simply select which from the budled data files to use without having to figure out where the library directory is located on the filesystem.

rabuchaim commented 7 months ago

Hi there!

I'll take a look at this..

Currently, even if the file entered is invalid, it searches for the traditional file "geoip2fast.dat.gz" in the current directory of your code, then it searches in the library directory.

If it doesn't find anything in these 2 directories, it sends an Exception.

As you can see below, I removed the dat.gz files from the library directory, so first it searches in the current directory (/root/geoip2fast.dat.gz) and then in the library directory (/usr/local/lib/python3.11/dist-packages/geoip2fast/geoip2fast.dat.gz). As I removed the files, it sends an exception

root@tucupi:~# python3
Python 3.11.6 (main, Oct 23 2023, 22:48:54) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from geoip2fast import GeoIP2Fast
>>> G = GeoIP2Fast(geoip2fast_data_file="blah blah", verbose=True)
FileNotFoundError: [Errno 2] No such file or directory: '/root/geoip2fast.dat.gz'

During handling of the above exception, another exception occurred:

FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.11/dist-packages/geoip2fast/geoip2fast.dat.gz'

During handling of the above exception, another exception occurred:

geoip2fast.geoip2fast.GeoIPError: Unable to determine the path of library geoip2fast.dat.gz - [Errno 2] No such file or directory: '/usr/local/lib/python3.11/dist-packages/geoip2fast/geoip2fast.dat.gz'

But if it finds a 'geoip2fast.dat.gz' file in one of these 2 directories, it loads the default even if you enter an invalid file.

I'm working on version 1.2.0 which will support city names and I should already include this detail of displaying an exception if the file is not found. I'm making the final adjustments... Maybe tomorrow it will be ready

I will also include a function to show the path of the file it is currently working on. For now, it is only possible to see this through the console by running "./geoip2fast.py -vvv"

root@tucupi:/opt/maxmind# geoip2fast -vvv
Using datafila: /opt/maxmind/geoip2fast.dat.gz

Just one question: Can you tell me which operating system you are using? Because I noticed that the memory count should at least appear. On Linux and Windows it appears without a problem.

Thanks for your contribution Ondřej Caletka!

rabuchaim commented 7 months ago

In the example you sent, the memory count does not appear, only the network count appears... which shows that it loaded the default file even though it provided an invalid file.

In my tests, even providing an invalid file, it loads the default file and displays memory usage.

On Ubuntu 22.04:

root@tucupi:~# python3
Python 3.11.6 (main, Oct 23 2023, 22:48:54) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from geoip2fast import GeoIP2Fast
>>> G = GeoIP2Fast(geoip2fast_data_file="blah blah", verbose=True)
GeoIP2Fast v1.1.8 is ready! geoip2fast.dat.gz loaded with 458848 networks in 0.03349 seconds and using 25.35 MiB.
>>> G.lookup("200.204.0.10")
{'ip': '200.204.0.10', 'country_code': 'BR', 'country_name': 'Brazil', 'cidr': '200.204.0.0/14', 'hostname': '', 'is_private': False, 'asn_name': '', 'elapsed_time': '0.000219602 sec'}
>>>

Now on Windows 10:

C:\Users\ricar>python3
Python 3.11.6 (tags/v3.11.6:8b6ee5b, Oct  2 2023, 14:57:12) [MSC v.1935 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from geoip2fast import GeoIP2Fast
>>> G = GeoIP2Fast(geoip2fast_data_file="blah blah",verbose=True)
GeoIP2Fast v1.1.8 is ready! geoip2fast.dat.gz loaded with 458848 networks in 0.04577 seconds and using 26.09 MiB.
>>> G.lookup("1.1.1.1")
{'ip': '1.1.1.1', 'country_code': 'AU', 'country_name': 'Australia', 'cidr': '1.1.1.1/32', 'hostname': '', 'is_private': False, 'asn_name': '', 'elapsed_time': '0.000143200 sec'}
>>>

This made me curious... did you manually change the memory count to 0.0 MiB or did it not actually show? If it didn't show, can you tell me the operating system you are using please? We will fix this in the next version too

oskar456 commented 7 months ago

Hello, I am on macOS. It shows 0.0 MiB, I haven't altered the output.

Regarding the data file loading, I think there is a logic error along this part:

        if geoip2fast_data_file != "":
            try:
                if os.path.isfile(geoip2fast_data_file) == True:
                    self.data_file = geoip2fast_data_file
            except Exception as ERR:
                raise GeoIPError("Unable to access the specified file %s. %s"%(geoip2fast_data_file,str(ERR)))

The os.path.isfile() will make sure only existing file (including full path) will make it into self.data_file. The exception handling seems also strange to me as the aforementioned function hardly ever raises anything.

rabuchaim commented 7 months ago

Hello,

This part loads the file from the GEOIP2FAST_DAT_FILE variable if it does not find the specified file. I will throw an exception at this and remove the default.

                 if os.path.isfile(geoip2fast_data_file) == True:
                     self.data_file = geoip2fast_data_file

Regarding memory, I made an adjustment to work on MacOS. I tested it with versions 12 and 13 and I will include it in the next release.

import subprocess
def get_mem_usage()->float:
    ''' Memory usage in MiB '''
    ##──── LINUX ─────────────
    try: 
        with open('/proc/self/status') as f:
            memory_usage = f.read().split('VmRSS:')[1].split('\n')[0][:-3]
        return float(memory_usage.strip()) / 1024
    except:
        ##──── WINDOWS ─────────────
        try:
            pid = ctypes.windll.kernel32.GetCurrentProcessId()
            process_handle = ctypes.windll.kernel32.OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, False, pid)
            counters = PROCESS_MEMORY_COUNTERS()
            counters.cb = ctypes.sizeof(PROCESS_MEMORY_COUNTERS)
            if ctypes.windll.psapi.GetProcessMemoryInfo(process_handle, ctypes.byref(counters), ctypes.sizeof(counters)):
                memory_usage = counters.WorkingSetSize
                return float((int(memory_usage) / 1024) / 1024)
        except:
            ##──── MACOS ─────────────
            try:
                result = subprocess.check_output(['ps', '-p', str(os.getpid()), '-o', 'rss='])
                return float(int(result.strip()) / 1024)
            except:
                return 0.0

Thanks for the feedback Ondřej !

rabuchaim commented 7 months ago

hi Ondřej,

I made a beta release v1.1.9beta1 to test the memory problem.. I want to ask you to validate please.. https://test.pypi.org/project/geoip2fast/1.1.9b1/

pip install -i https://test.pypi.org/simple/ geoip2fast==1.1.9b1
import subprocess
def get_mem_usage()->float:
     ''' Memory usage in MiB '''
     ##──── LINUX & MACOS ─────────────
     try:
         result = subprocess.check_output(['ps', '-p', str(os.getpid()), '-o', 'rss='])
         return float(int(result.strip()) / 1024)
     except:
         ##──── WINDOWS ─────────────
         try:
             pid = ctypes.windll.kernel32.GetCurrentProcessId()
             process_handle = ctypes.windll.kernel32.OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, False, pid)
             counters = PROCESS_MEMORY_COUNTERS()
             counters.cb = ctypes.sizeof(PROCESS_MEMORY_COUNTERS)
             if ctypes.windll.psapi.GetProcessMemoryInfo(process_handle, ctypes.byref(counters), ctypes.sizeof(counters)):
                 memory_usage = counters.WorkingSetSize
                 return float((int(memory_usage) / 1024) / 1024)
         except:
             return 0.0

I am working on the initial problem with this issue and will correct it in version 1.1.9beta2 to be released by tomorrow

rabuchaim commented 7 months ago

I created version 1.1.9beta2... could you test it please? and confirm that it is the expected behavior... https://test.pypi.org/project/geoip2fast/1.1.9b2/

pip install -i https://test.pypi.org/simple/ geoip2fast==1.1.9b2

To fix the problem of loading a file that does not exist, I repeated the logic of the _load_data() function, which is used to load the default file, but is only used if no file is specified.

This logic is required also in the _load_data() function if the file is replaced at run time. It will be used in a future release that will automatically download updates.

In the evidence below, we tried to load the “blah, blah” file. The code tries to locate the file in the current directory (/tmp) and in the library directory. If the application does not find it, create an exception


Below, we tried to load the file "geoip2fast-asn-ipv6.dat.gz" without providing the path. It was located and loaded correctly as can be seen from the characteristic memory footprint of the asn+ipv6 file.


In this next evidence, we specified a file in /root/geoip2fast-asn-ipv6.dat.gz that was loaded correctly also.


And in the evidence below, the new property to return the path of the file that was loaded. show_database_path


And below we have the fix for the memory issue under MacOS

import subprocess
def get_mem_usage()->float:
     ''' Memory usage in MiB '''
     ##──── LINUX & MACOS ─────────────
     try:
         result = subprocess.check_output(['ps', '-p', str(os.getpid()), '-o', 'rss='])
         return float(int(result.strip()) / 1024)
     except:
         ##──── WINDOWS ─────────────
         try:
             pid = ctypes.windll.kernel32.GetCurrentProcessId()
             process_handle = ctypes.windll.kernel32.OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, False, pid)
             counters = PROCESS_MEMORY_COUNTERS()
             counters.cb = ctypes.sizeof(PROCESS_MEMORY_COUNTERS)
             if ctypes.windll.psapi.GetProcessMemoryInfo(process_handle, ctypes.byref(counters), ctypes.sizeof(counters)):
                 memory_usage = counters.WorkingSetSize
                 return float((int(memory_usage) / 1024) / 1024)
         except:
             return 0.0



[..]s Ricardo

rabuchaim commented 7 months ago

Fixed!

What's new in v1.1.9 - 22/Nov/2023

Thanks again for your contribution Ondřej Caletka!