mikeizbicki / cmc-csci143

big data course materials
40 stars 77 forks source link

project - not able to load data #545

Closed myngpog closed 4 weeks ago

myngpog commented 1 month ago

hey everyone!

To load the data, (Task 2.2) I am using the load_tweets.py and loat_parallel.sh files from twitter_postgres_parallel files (and have only changed the ports to match my current one), but when I run it sh load_parallel.sh , I get this:

(Background on this error at: https://sqlalche.me/e/14/e3q8)
Traceback (most recent call last):
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 3371, in _wrap_pool_connect
    return fn()
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 327, in connect
    return _ConnectionFairy._checkout(self)
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 894, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 493, in checkout
    rec = pool._do_get()
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/pool/impl.py", line 146, in _do_get
    self._dec_overflow()
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 72, in __exit__
    with_traceback=exc_tb,
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/pool/impl.py", line 143, in _do_get
    return self._create_connection()
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 273, in _create_connection
    return _ConnectionRecord(self)
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 388, in __init__
    self.__connect()
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 691, in __connect
    pool.logger.debug("Error on connect(): %s", e)
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 72, in __exit__
    with_traceback=exc_tb,
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 686, in __connect
    self.dbapi_connection = connection = pool._invoke_creator(self)
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/engine/create.py", line 574, in connect
    return dialect.connect(*cargs, **cparams)
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 598, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/home/mynguyen/.local/lib/python3.6/site-packages/psycopg2/__init__.py", line 122, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "load_tweets.py", line 416, in <module>
    connection = engine.connect()
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 3325, in connect
    return self._connection_cls(self, close_with_result=close_with_result)
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 96, in __init__
    else engine.raw_connection()
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 3404, in raw_connection
    return self._wrap_pool_connect(self.pool.connect, _connection)
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 3375, in _wrap_pool_connect
    e, dialect, self
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2209, in _handle_dbapi_exception_noconnection
    sqlalchemy_exception, with_traceback=exc_info[2], from_=e
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 3371, in _wrap_pool_connect
    return fn()
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 327, in connect
    return _ConnectionFairy._checkout(self)
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 894, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 493, in checkout
    rec = pool._do_get()
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/pool/impl.py", line 146, in _do_get
    self._dec_overflow()
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 72, in __exit__
    with_traceback=exc_tb,
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/pool/impl.py", line 143, in _do_get
    return self._create_connection()
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 273, in _create_connection
    return _ConnectionRecord(self)
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 388, in __init__
    self.__connect()
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 691, in __connect
    pool.logger.debug("Error on connect(): %s", e)
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 72, in __exit__
    with_traceback=exc_tb,
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 686, in __connect
    self.dbapi_connection = connection = pool._invoke_creator(self)
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/engine/create.py", line 574, in connect
    return dialect.connect(*cargs, **cparams)
  File "/home/mynguyen/.local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 598, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/home/mynguyen/.local/lib/python3.6/site-packages/psycopg2/__init__.py", line 122, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

(Background on this error at: https://sqlalche.me/e/14/e3q8)
Command exited with non-zero status 10
3.50user 0.55system 0:00.73elapsed 551%CPU (0avgtext+0avgdata 42928maxresident)k
32inputs+2488outputs (0major+158890minor)pagefaults 0swaps

any idea? thanks!

vibhuk10 commented 1 month ago

I am getting a similar error. I tried taking down, pruning, and rebuilding the containers and it still did not work. I also tried changing all my ports. It was working before with this same code.

this is for denormalized:

psql: FATAL:  no pg_hba.conf entry for host "172.19.0.1", user "postgres", database "postgres", SSL off

this is for normalized_batch:

Traceback (most recent call last):
  File "/home/vkrishnan/.local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 3371, in _wrap_pool_connect
    return fn()
  File "/home/vkrishnan/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 327, in connect
    return _ConnectionFairy._checkout(self)
  File "/home/vkrishnan/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 894, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/home/vkrishnan/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 493, in checkout
    rec = pool._do_get()
  File "/home/vkrishnan/.local/lib/python3.6/site-packages/sqlalchemy/pool/impl.py", line 146, in _do_get
    self._dec_overflow()
  File "/home/vkrishnan/.local/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 72, in __exit__
    with_traceback=exc_tb,
  File "/home/vkrishnan/.local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/home/vkrishnan/.local/lib/python3.6/site-packages/sqlalchemy/pool/impl.py", line 143, in _do_get
    return self._create_connection()
  File "/home/vkrishnan/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 273, in _create_connection
    return _ConnectionRecord(self)
  File "/home/vkrishnan/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 388, in __init__
    self.__connect()
  File "/home/vkrishnan/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 691, in __connect
    pool.logger.debug("Error on connect(): %s", e)
  File "/home/vkrishnan/.local/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 72, in __exit__
    with_traceback=exc_tb,
  File "/home/vkrishnan/.local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/home/vkrishnan/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 686, in __connect
    self.dbapi_connection = connection = pool._invoke_creator(self)
  File "/home/vkrishnan/.local/lib/python3.6/site-packages/sqlalchemy/engine/create.py", line 574, in connect
    return dialect.connect(*cargs, **cparams)
  File "/home/vkrishnan/.local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 598, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/home/vkrishnan/.local/lib/python3.6/site-packages/psycopg2/__init__.py", line 122, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: FATAL:  the database system is not yet accepting connections
DETAIL:  Consistent recovery state has not been yet reached.

Please let me know if anyone has a solution.

myngpog commented 1 month ago

is this for index or project? cause i am only working with the normalized stuff, I think I fixed it by changing the links in the .env files but I'm not sure yet as I am waiting for it to load (it's taking a while)

vibhuk10 commented 1 month ago

I am working on the indexes homework. What did you change to fix this?

myngpog commented 1 month ago

unfortunately, I didn't have that problem when working on the indexes homework assignment. but looking at your error it may have something to do with your ports (I'm not 100% sure though, only like 50%)

luisgomez214 commented 1 month ago

hello, I am having the same issue were you able to solve this?

myngpog commented 4 weeks ago

yeah i updated the .env files to match the URL