mikeizbicki / cmc-csci143

big data course materials
40 stars 77 forks source link

Error loading data into docker in twitter_postgres_indexes #538

Open gibsonfriedman opened 1 month ago

gibsonfriedman commented 1 month ago

Hi,

When I run sh load_tweets_parallel.sh I'm getting this error and I'm not sure what the fix would be:

`================================================================================ load pg_denormalized

COPY 2979992 COPY 3044365 COPY 3038917 COPY 3143286 COPY 3129896 COPY 3189325 COPY 3157691 COPY 3148130 COPY 3306556 COPY 3376266 1566.69user 343.86system 1:17:45elapsed 40%CPU (0avgtext+0avgdata 17760maxresident)k 4373952inputs+107912outputs (2major+67422minor)pagefaults 0swaps

load pg_normalized_batch

Traceback (most recent call last): File "/home/gfriedman25/.local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 3371, in _wrap_pool_connect return fn() File "/home/gfriedman25/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 327, in connect return _ConnectionFairy._checkout(self) File "/home/gfriedman25/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 894, in _checkout fairy = _ConnectionRecord.checkout(pool) File "/home/gfriedman25/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 493, in checkout rec = pool._do_get() File "/home/gfriedman25/.local/lib/python3.6/site-packages/sqlalchemy/pool/impl.py", line 146, in _do_get self._dec_overflow() File "/home/gfriedman25/.local/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 72, in exit with_traceback=exctb, File "/home/gfriedman25/.local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 211, in raise raise exception File "/home/gfriedman25/.local/lib/python3.6/site-packages/sqlalchemy/pool/impl.py", line 143, in _do_get return self._create_connection() File "/home/gfriedman25/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 273, in _create_connection return _ConnectionRecord(self) File "/home/gfriedman25/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 388, in init self.connect() File "/home/gfriedman25/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 691, in connect pool.logger.debug("Error on connect(): %s", e) File "/home/gfriedman25/.local/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 72, in exit with_traceback=exctb, File "/home/gfriedman25/.local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 211, in raise raise exception File "/home/gfriedman25/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 686, in __connect self.dbapi_connection = connection = pool._invoke_creator(self) File "/home/gfriedman25/.local/lib/python3.6/site-packages/sqlalchemy/engine/create.py", line 574, in connect return dialect.connect(*cargs, *cparams) File "/home/gfriedman25/.local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 598, in connect return self.dbapi.connect(cargs, cparams) File "/home/gfriedman25/.local/lib/python3.6/site-packages/psycopg2/init.py", line 122, in connect conn = _connect(dsn, connection_factory=connection_factory, kwasync) psycopg2.OperationalError: could not connect to server: Connection refused Is the server running on host "localhost" (127.0.0.1) and accepting TCP/IP connections on port 1345? `

If anyone has any recommendations on what I can try to fix this I would really appreciate it as I'm not sure what causes the issue, thanks!

AvidThinkerArsum commented 1 month ago

This seems to suggest some error with building the container. Did you try bringing them down, clearing everything, bringing them up and then trying this? I had the same issue and then I saw that I didn't make my containers ...

gibsonfriedman commented 1 month ago

@AvidThinkerArsum I tried that earlier but it didn't seem to change anything.

AvidThinkerArsum commented 1 month ago

Yes, I wonder why. Now that I tried I'm getting the same error as you. I'm sure my two containers are up and my ports are definitely right:

Screenshot 2024-04-28 at 12 42 30 AM

My error looks like:

psql: could not connect to server: Connection refused
    Is the server running on host "localhost" (127.0.0.1) and accepting
    TCP/IP connections on port 1620?
psql: could not connect to server: Connection refused
    Is the server running on host "localhost" (127.0.0.1) and accepting
    TCP/IP connections on port 1620?
psql: could not connect to server: Connection refused
    Is the server running on host "localhost" (127.0.0.1) and accepting
    TCP/IP connections on port 1620?
psql: could not connect to server: Connection refused
    Is the server running on host "localhost" (127.0.0.1) and accepting
    TCP/IP connections on port 1620?
psql: could not connect to server: Connection refused
    Is the server running on host "localhost" (127.0.0.1) and accepting
    TCP/IP connections on port 1620?
psql: could not connect to server: Connection refused
    Is the server running on host "localhost" (127.0.0.1) and accepting
    TCP/IP connections on port 1620?
psql: could not connect to server: Connection refused
    Is the server running on host "localhost" (127.0.0.1) and accepting
    TCP/IP connections on port 1620?
psql: could not connect to server: Connection refused
    Is the server running on host "localhost" (127.0.0.1) and accepting
    TCP/IP connections on port 1620?
psql: could not connect to server: Connection refused
    Is the server running on host "localhost" (127.0.0.1) and accepting
    TCP/IP connections on port 1620?
psql: could not connect to server: Connection refused
    Is the server running on host "localhost" (127.0.0.1) and accepting
    TCP/IP connections on port 1620?

and

....
psycopg2.OperationalError: could not connect to server: Connection refused
    Is the server running on host "localhost" (127.0.0.1) and accepting
    TCP/IP connections on port 1741?

I was not able to delete the data (I don't know if we would need to do it) so maybe Mike could help us with that?

Also, if someone could let me know if my understanding is correct:

So basically, we have two containers for this assignment - denormalized and normalized_batch. So, we can build these using docker-compose up -d --build. Then, when we run load_tweets_parallel.sh it should download the data into each of the respective containers and then we could run our tests. Can someone let me know when are the volumes made (why I don't see any with docker volume ls) and what are they exactly? My understanding is that they are like pieces of data (like a hard drive) that are persistent and could be used between containers.

gibsonfriedman commented 1 month ago

@AvidThinkerArsum @mikeizbicki My error looks similar to that, I also think it might be due to the data not being deleted properly but I'm not sure what else I would be able to do to try and fix the issue.

vibhuk10 commented 1 month ago

@AvidThinkerArsum @gibsonfriedman I am going through this same issue when trying to load in my data. Did you guys ever figure out the solution to this?