jjmontesl / cubetl

CubETL - Framework and tool for data ETL (Extract, Transform and Load) in Python (PERSONAL PROJECT / SELDOM MAINTAINED)
MIT License
26 stars 10 forks source link

cubetl/geoip uses incf.countryutils that doesn't support python 3 #6

Open dfrankow opened 5 years ago

dfrankow commented 5 years ago

I am using python 3.7.4.

To reproduce:

$ docker-compose run cubetls python -c "import cubetl.geoip"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/app/cubetl/geoip/__init__.py", line 26, in <module>
    from incf.countryutils import transformations
  File "/usr/local/lib/python3.7/site-packages/incf/countryutils/transformations.py", line 151
    raise KeyError, code
                  ^
SyntaxError: invalid syntax

That package was last updated in 2009. There is a more recently updated version. It looks to not have the same error:

$ git clone https://github.com/wyldebeast-wunderliebe/incf.countryutils.git
$ cd incf.countryutils.git
$ virtualenv .venv
$ source .venv/bin/activate
$ python setup.py install
$ python -c "from incf.countryutils import transformations"
$ 
dfrankow commented 5 years ago

FYI, if using the new package could fix this, you could change requirements.txt as follows:

diff --git a/requirements.txt b/requirements.txt
index 252a01e..fac124e 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -4,7 +4,8 @@ dateutils==0.6.6
 fake-factory==0.5.7
 fs==2.1.2
 httplib2==0.12.0
-incf.countryutils==1.0
+# updated incf.countryutils for Python 3
+https://github.com/wyldebeast-wunderliebe/incf.countryutils/zipball/1.1
 incf.dai==0.7
 lxml==4.2.5
 odict==1.6.2

However, I think geoip has other issues as well. I'm going to use this issue to write notes.

dfrankow commented 5 years ago

GeoIP requires libGeoIP. See https://github.com/maxmind/geoip-api-python/issues/20.

Here are some instructions to install that (in the docker image I started):

apt update
apt -y install software-properties-common dirmngr apt-transport-https lsb-release ca-certificates
cd /root
add-apt-repository -y ppa:maxmind/ppa
apt install -y libgeoip1 libgeoip-dev geoip-bin
pip install GeoIP user_agents

See also https://github.com/maxmind/geoip-api-c#on-ubuntu-using-ppa.

However, note geoip is deprecated, and they now recommend geoip2. :/

dfrankow commented 5 years ago

Note even after installing libGeoIP and GeoIP, the test_loganalyzer test fails with some other issue:

self = <sqlalchemy.engine.strategies.PlainEngineStrategy object at 0x7f05956a72d0>, name_or_url = None
kwargs = {'connect_args': {}}, u = None

    def create(self, name_or_url, **kwargs):
        # create url.URL object
        u = url.make_url(name_or_url)

>       plugins = u._instantiate_plugins(kwargs)
E       AttributeError: 'NoneType' object has no attribute '_instantiate_plugins'

/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/strategies.py:52: AttributeError
================================================ warnings summary ================================================
cubetl/olap/sqlschema.py:415
  /app/cubetl/olap/sqlschema.py:415: DeprecationWarning: invalid escape sequence \.
    pattern = ci_key.replace('.', '\.').replace('**', '.*').replace('*', '\w+')

cubetl/olap/sqlschema.py:415
  /app/cubetl/olap/sqlschema.py:415: DeprecationWarning: invalid escape sequence \w
    pattern = ci_key.replace('.', '\.').replace('**', '.*').replace('*', '\w+')

tests/test_examples.py::TestExamples::test_loganalyzer
  /usr/local/lib/python3.7/site-packages/beautifulsoup4-4.8.1-py3.7.egg/bs4/dammit.py:53: DeprecationWarning: invalid escape sequence \s
    xml_encoding = '^\s*<\\?.*encoding=[\'"](.*?)[\'"].*\\?>'

-- Docs: https://docs.pytest.org/en/latest/warnings.html

I don't know what's up here.

jjmontesl commented 5 years ago

The documentation of cubetl.geoip.GeoIPFromAddress node says that:

Note: This module uses incf.countryutils, but as the version in PyPI does not currently
support Python 3, the updated package for Python 3 is included by this project (see
https://github.com/wyldebeast-wunderliebe/incf.countryutils/issues/2 ).

Since I had already copied a working countryutils copy from that repository, try removing the offending package from your environment:

pip uninstall incf.countryutils

I have removed this dependency from the requirements.txt file.

I wouldn't like to make the project depend on geoip. These dependencies shall be installed by users if they need them, that's why you have to install GeoIP manually (as per the docs). This should perhaps be discovered by the cubetl.geoip module and provide a meaningful error.

jjmontesl commented 5 years ago

I have pushed the corrections to the master branch, the loganalyzer example now works for me given the above works.