tokern / piicatcher

Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub
https://tokern.io/piicatcher/
Apache License 2.0
274 stars 93 forks source link

Connection refused when scanning Postgres with Docker #232

Open maaz1m opened 1 year ago

maaz1m commented 1 year ago

I am having trouble getting piicatcher to connect to Postgres. I am running Postgres in a Kubernetes cluster, and I am doing port forwarding from localhost to the cluster. Postgres is working fine, as shown below.

(base) maaz@maaz-thinkpad:~/data/piicatcher$ psql --host 127.0.0.1 -U testuser -d postgres -p 5432
Password for user testuser: 
psql (15.3 (Ubuntu 15.3-1.pgdg22.04+1))
Type "help" for help.

postgres=# \q

To run piicatcher, I pull the image from DockerHub and then run the following commands.

(base) maaz@maaz-thinkpad:~/data/piicatcher$ alias piicatcher='docker run -v ${HOME}/.config/tokern:/config -u $(id -u ${USER}):$(id -g ${USER}) -it --add-host=host.docker.internal:host-gateway tokern/piicatcher:latest'
(base) maaz@maaz-thinkpad:~/data/piicatcher$ piicatcher catalog add-postgresql --name testdb --database postgres --username testuser --password password --uri host.docker.internal --port 5432
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/pathlib.py", line 1175, in mkdir
    self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/.config/goog-stats'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/pysetup/.venv/bin/piicatcher", line 5, in <module>
    from piicatcher.command_line import app
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/piicatcher/command_line.py", line 38, in <module>
    analytics = Stats(__google_analytics_tid__)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/goog_stats/__init__.py", line 21, in __init__
    self.create_working_dir()
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/goog_stats/__init__.py", line 35, in create_working_dir
    self.working_dir.mkdir(parents=True, exist_ok=True)
  File "/usr/local/lib/python3.10/pathlib.py", line 1179, in mkdir
    self.parent.mkdir(parents=True, exist_ok=True)
  File "/usr/local/lib/python3.10/pathlib.py", line 1175, in mkdir
    self._accessor.mkdir(self, mode)
PermissionError: [Errno 13] Permission denied: '/.config'

I can fix this by removing -u $(id -u ${USER}):$(id -g ${USER}), but this leads to the following error.

(base) maaz@maaz-thinkpad:~/data/piicatcher$ alias piicatcher='docker run -v ${HOME}/.config/tokern:/config -it --add-host=host.docker.internal:host-gateway tokern/piicatcher:latest'
(base) maaz@maaz-thinkpad:~/data/piicatcher$ piicatcher catalog add-postgresql --name test --database postgres --username testuser --password password --uri host.docker.internal --port 5432
Registered Postgres database test
(base) maaz@maaz-thinkpad:~/data/piicatcher$ piicatcher detect --source-name test
Traceback (most recent call last):
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2336, in _wrap_pool_connect
    return fn()
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 304, in unique_connection
    return _ConnectionFairy._checkout(self)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 778, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 495, in checkout
    rec = pool._do_get()
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/pool/impl.py", line 139, in _do_get
    with util.safe_reraise():
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.raise_(
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/pool/impl.py", line 137, in _do_get
    return self._create_connection()
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 309, in _create_connection
    return _ConnectionRecord(self)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 440, in __init__
    self.__connect(first_connect_check=True)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 660, in __connect
    with util.safe_reraise():
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.raise_(
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 656, in __connect
    connection = pool._invoke_creator(self)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect
    return dialect.connect(*cargs, **cparams)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 508, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/psycopg2/__init__.py", line 122, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: could not connect to server: Connection refused
        Is the server running on host "host.docker.internal" (172.17.0.1) and accepting
        TCP/IP connections on port 5432?

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/pysetup/.venv/bin/piicatcher", line 8, in <module>
    sys.exit(app())
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/typer/main.py", line 214, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/typer/main.py", line 532, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/piicatcher/command_line.py", line 222, in detect
    op = scan_database(
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/piicatcher/api.py", line 65, in scan_database
    scan_sources(
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/dbcat/api.py", line 140, in scan_sources
    scanner.scan()
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/dbcat/catalog/db.py", line 131, in scan
    extractor.init(Scoped.get_scoped_conf(self._conf, extractor.get_scope()))
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/databuilder/extractor/base_postgres_metadata_extractor.py", line 68, in init
    self._alchemy_extractor.init(sql_alch_conf)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/databuilder/extractor/sql_alchemy_extractor.py", line 31, in init
    self.connection = self._get_connection()
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/databuilder/extractor/sql_alchemy_extractor.py", line 57, in _get_connection
    conn = engine.connect()
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2263, in connect
    return self._connection_cls(self, **kwargs)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 104, in __init__
    else engine.raw_connection()
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2369, in raw_connection
    return self._wrap_pool_connect(
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2339, in _wrap_pool_connect
    Connection._handle_dbapi_exception_noconnection(
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1583, in _handle_dbapi_exception_noconnection
    util.raise_(
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2336, in _wrap_pool_connect
    return fn()
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 304, in unique_connection
    return _ConnectionFairy._checkout(self)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 778, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 495, in checkout
    rec = pool._do_get()
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/pool/impl.py", line 139, in _do_get
    with util.safe_reraise():
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.raise_(
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/pool/impl.py", line 137, in _do_get
    return self._create_connection()
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 309, in _create_connection
    return _ConnectionRecord(self)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 440, in __init__
    self.__connect(first_connect_check=True)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 660, in __connect
    with util.safe_reraise():
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.raise_(
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 656, in __connect
    connection = pool._invoke_creator(self)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect
    return dialect.connect(*cargs, **cparams)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 508, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/psycopg2/__init__.py", line 122, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not connect to server: Connection refused
        Is the server running on host "host.docker.internal" (172.17.0.1) and accepting
        TCP/IP connections on port 5432?

(Background on this error at: http://sqlalche.me/e/13/e3q8)

I have tried both localhost and host.docker.internal in the URI argument, but I get the same error each time. Piicatcher installed through python in a venv works, for some reason.

nicolepng commented 1 year ago

Hi @maaz1m what device are you using?

maaz1m commented 1 year ago

Hi @nicolepng, I'm doing this on an ubuntu machine.

nicolepng commented 1 year ago

This looks more like an error due to your connection / configuration and not PIICatcher per se