Open JamesKunstle opened 9 months ago
Problem is that, at some point, the postgres-cache
container restarted, but the app-server
, which has a sidecar initialization script that creates the augur-cache
database and migrates schema, hadn't restarted and run.
Because the volume assoc. with the postgres-cache
container was ephemeral, the dbms cluster was fresh on restart- no augur-cache
db, no schema.
pvc is mounted at /var/lib/postgresql/data
, but PG instance wants /var/lib/pgql/data
, so we were just writing to the wrong dir.
Exception on /_dash-update-component [POST] Traceback (most recent call last): File "/opt/app-root/lib64/python3.9/site-packages/flask/app.py", line 2525, in wsgi_app response = self.full_dispatch_request() File "/opt/app-root/lib64/python3.9/site-packages/flask/app.py", line 1822, in full_dispatch_request rv = self.handle_user_exception(e) File "/opt/app-root/lib64/python3.9/site-packages/flask/app.py", line 1820, in full_dispatch_request rv = self.dispatch_request() File "/opt/app-root/lib64/python3.9/site-packages/flask/app.py", line 1796, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(*view_args) File "/opt/app-root/lib64/python3.9/site-packages/dash/dash.py", line 1274, in dispatch ctx.run( File "/opt/app-root/lib64/python3.9/site-packages/dash/_callback.py", line 440, in add_context output_value = func(func_args, func_kwargs) # %% callback invoked %% File "/opt/app-root/src/pages/index/index_callbacks.py", line 419, in run_queries not_ready = cf.get_uncached(f.name, repos) File "/opt/app-root/src/cache_manager/cache_facade.py", line 126, in get_uncached with pg.connect(cache_cx_string) as cache_conn: File "/opt/app-root/lib64/python3.9/site-packages/psycopg2/init.py", line 122, in connect conn = _connect(dsn, connection_factory=connection_factory, kwasync) psycopg2.OperationalError: connection to server at "postgres-cache" (172.30.239.178), port 5432 failed: FATAL: database "augur_cache" does not exist
Received this error in application server logs- background callbacks were silently failing because psycopg threw the above error.
Fix: ensure that volume associated with
postgres-cache
is not ephemeral + handle broken connections to database (cache or full) programmatically rather than allowing worker to fail outright.