mikeizbicki / cmc-csci143

big data course materials
40 stars 76 forks source link

Issue w/ loading pg_normalized_batch #496

Closed Yugi00 closed 7 months ago

Yugi00 commented 7 months ago

Hi all, I am currently having an issue when trying to load the pg_normalized_batch data. I am currently receiving the following error:

Traceback (most recent call last):
  File "/home/runner/work/twitter_postgres_parallel/twitter_postgres_parallel/load_tweets_batch.py", line 441, in <module>
    connection = engine.connect()
  File "/home/runner/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3325, in connect
    return self._connection_cls(self, close_with_result=close_with_result)
  File "/home/runner/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 96, in __init__
    else engine.raw_connection()
  File "/home/runner/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3404, in raw_connection
    return self._wrap_pool_connect(self.pool.connect, _connection)
  File "/home/runner/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3374, in _wrap_pool_connect
    Connection._handle_dbapi_exception_noconnection(
  File "/home/runner/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2208, in _handle_dbapi_exception_noconnection
    util.raise_(
  File "/home/runner/.local/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/home/runner/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3371, in _wrap_pool_connect
    return fn()
  File "/home/runner/.local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 327, in connect
    return _ConnectionFairy._checkout(self)
  File "/home/runner/.local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 894, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/home/runner/.local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 493, in checkout
    rec = pool._do_get()
  File "/home/runner/.local/lib/python3.10/site-packages/sqlalchemy/pool/impl.py", line 145, in _do_get
    with util.safe_reraise():
  File "/home/runner/.local/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
    compat.raise_(
  File "/home/runner/.local/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/home/runner/.local/lib/python3.10/site-packages/sqlalchemy/pool/impl.py", line 143, in _do_get
    return self._create_connection()
  File "/home/runner/.local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 273, in _create_connection
    return _ConnectionRecord(self)
  File "/home/runner/.local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 388, in __init__
    self.__connect()
  File "/home/runner/.local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 690, in __connect
    with util.safe_reraise():
  File "/home/runner/.local/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
    compat.raise_(
  File "/home/runner/.local/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/home/runner/.local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 686, in __connect
    self.dbapi_connection = connection = pool._invoke_creator(self)
  File "/home/runner/.local/lib/python3.10/site-packages/sqlalchemy/engine/create.py", line 574, in connect
    return dialect.connect(*cargs, **cparams)
  File "/home/runner/.local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 598, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/home/runner/.local/lib/python3.10/site-packages/psycopg2/__init__.py", line 122, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "localhost" (::1), port 13107 failed: Connection refused
    Is the server running on that host and accepting TCP/IP connections?
connection to server at "localhost" (127.0.0.1), port 13107 failed: Connection refused
    Is the server running on that host and accepting TCP/IP connections?

I am not sure why I am receiving this since I believe I have setup the ports correctly and both the pg_normalized and pg_denormalized data load correctly. I have also changed the port multiple times and encountered the same issue. I will put my code for the docker-compose.yml file as well as for the load_tweets_parallel.py below:

mikeizbicki commented 7 months ago

The last lines of your error

sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "localhost" (::1), port 13107 failed: Connection refused
    Is the server running on that host and accepting TCP/IP connections?
connection to server at "localhost" (127.0.0.1), port 13107 failed: Connection refused
    Is the server running on that host and accepting TCP/IP connections?

suggest that the database is not running. You can verify that with docker ps. If that is the case, then you should run docker-compose logs to figure out why it's not running and fix the problem.

Yugi00 commented 7 months ago

After running docker-compose up -d and then docker ps, I get:

CONTAINER ID   IMAGE                                           COMMAND                  CREATED         STATUS         PORTS                                         NAMES
7c9c7f1de6e5   twitter_postgres_parallel_pg_normalized_batch   "docker-entrypoint.s…"   6 seconds ago   Up 3 seconds   0.0.0.0:17368->5432/tcp, :::13107->5432/tcp   twitter_postgres_parallel_pg_normalized_batch_1
2035622ab669   twitter_postgres_parallel_pg_normalized         "docker-entrypoint.s…"   6 seconds ago   Up 4 seconds   0.0.0.0:13106->5432/tcp, :::13106->5432/tcp   twitter_postgres_parallel_pg_normalized_1
c28a48245aeb   twitter_postgres_parallel_pg_denormalized       "docker-entrypoint.s…"   6 seconds ago   Up 2 seconds   0.0.0.0:13105->5432/tcp, :::13105->5432/tcp   twitter_postgres_parallel_pg_denormalized_1

I then ran docker-compose logs as suggested just in case I was missing anything and got:

pg_normalized_1        |
pg_normalized_1        | PostgreSQL Database directory appears to contain a database; Skipping initialization
pg_normalized_1        |
pg_normalized_1        | 2024-04-11 03:52:42.968 UTC [1] LOG:  starting PostgreSQL 16.2 (Debian 16.2-1.pgdg110+2) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
pg_normalized_1        | 2024-04-11 03:52:42.968 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
pg_normalized_1        | 2024-04-11 03:52:42.968 UTC [1] LOG:  listening on IPv6 address "::", port 5432
pg_normalized_1        | 2024-04-11 03:52:42.969 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
pg_normalized_1        | 2024-04-11 03:52:42.973 UTC [29] LOG:  database system was shut down at 2024-04-11 03:52:29 UTC
pg_normalized_1        | 2024-04-11 03:52:42.979 UTC [1] LOG:  database system is ready to accept connections
pg_denormalized_1      |
pg_denormalized_1      | PostgreSQL Database directory appears to contain a database; Skipping initialization
pg_denormalized_1      |
pg_denormalized_1      | 2024-04-11 03:52:44.941 UTC [1] LOG:  starting PostgreSQL 13.14 (Debian 13.14-1.pgdg120+2) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
pg_denormalized_1      | 2024-04-11 03:52:44.941 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
pg_denormalized_1      | 2024-04-11 03:52:44.941 UTC [1] LOG:  listening on IPv6 address "::", port 5432
pg_denormalized_1      | 2024-04-11 03:52:44.942 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
pg_denormalized_1      | 2024-04-11 03:52:44.945 UTC [27] LOG:  database system was shut down at 2024-04-11 03:52:29 UTC
pg_denormalized_1      | 2024-04-11 03:52:44.952 UTC [1] LOG:  database system is ready to accept connections
pg_normalized_batch_1  |
pg_normalized_batch_1  | PostgreSQL Database directory appears to contain a database; Skipping initialization
pg_normalized_batch_1  |
pg_normalized_batch_1  | 2024-04-11 03:52:43.916 UTC [1] LOG:  starting PostgreSQL 16.2 (Debian 16.2-1.pgdg110+2) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
pg_normalized_batch_1  | 2024-04-11 03:52:43.917 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
pg_normalized_batch_1  | 2024-04-11 03:52:43.917 UTC [1] LOG:  listening on IPv6 address "::", port 5432
pg_normalized_batch_1  | 2024-04-11 03:52:43.918 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
pg_normalized_batch_1  | 2024-04-11 03:52:43.922 UTC [29] LOG:  database system was shut down at 2024-04-11 03:52:29 UTC
pg_normalized_batch_1  | 2024-04-11 03:52:43.929 UTC [1] LOG:  database system is ready to accept connections

It seems that the databases are being brought up just fine, so not sure why I am still getting Connection refused error.

Yugi00 commented 7 months ago

FIXED: Brought down containers, deleted volumes, and then removed all existing containers. Built the containers again and it fixed the issue.