pgadmin-org / pgadmin4

pgAdmin is the most popular and feature rich Open Source administration and development platform for PostgreSQL, the most advanced Open Source database in the world.
https://www.pgadmin.org
Other
2.55k stars 659 forks source link

"Crypt Key missing" - Worker (pid:XX) was sent SIGKILL! #8065

Open ghost opened 1 month ago

ghost commented 1 month ago

Please note that security bugs or issues should be reported to security@pgadmin.org.

Describe the bug

Whenever a query is consuming resources, it produces an error that makes pgadmin container restart and give an SIGKILL before doing so:

172.25.9.248 - - [24/Oct/2024:15:37:40 +0000] "GET /sqleditor/poll/8544899 HTTP/1.1" 200 453 "https://pgadmin-pgadmin-prod.apps.dev.ocp.domain.com/sqleditor/panel/8544899?is_query_tool=true&sgid=361&sid=1856&did=16413&database_name=eutras01-prod" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36 Edg/130.0.0.0"
172.25.9.248 - - [24/Oct/2024:15:37:41 +0000] "GET /sqleditor/poll/8544899 HTTP/1.1" 200 453 "https://pgadmin-pgadmin-prod.apps.dev.ocp.domain.com/sqleditor/panel/8544899?is_query_tool=true&sgid=361&sid=1856&did=16413&database_name=eutras01-prod" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36 Edg/130.0.0.0"
[2024-10-24 15:37:41 +0000] [1] [ERROR] **Worker (pid:21) was sent SIGKILL! Perhaps out of memory?**
[2024-10-24 15:37:41 +0000] [122] [INFO] Booting worker with pid: 122
2024-10-24 15:37:48,566: INFO   pgadmin:    ########################################################
2024-10-24 15:37:48,566: INFO   pgadmin:    Starting pgAdmin 4 v8.12...
2024-10-24 15:37:48,566: INFO   pgadmin:    ########################################################
2024-10-24 15:37:48,566: DEBUG  pgadmin:    Python syspath: ['/pgadmin4', '/venv/bin', '/pgadmin4', '/usr/lib/python312.zip', '/usr/lib/python3.12', '/usr/lib/python3.12/lib-dynload', '/venv/lib/python3.12/site-packages', '/usr/lib/python3.12/site-packages', '/venv/lib/python3.12/site-packages/setuptools/_vendor']
2024-10-24 15:37:50,774: INFO   pgadmin:    Registering blueprint module: <AboutModule 'about'>
2024-10-24 15:37:50,775: INFO   pgadmin:    Registering blueprint module: <AuthenticateModule 'authenticate'>
2024-10-24 15:37:50,776: INFO   pgadmin:    Registering blueprint module: <BrowserModule 'browser'>
2024-10-24 15:37:53,981: INFO   pgadmin:    Registering blueprint module: <DashboardModule 'dashboard'>
2024-10-24 15:37:54,044: INFO   pgadmin:    Registering blueprint module: <HelpModule 'help'>
2024-10-24 15:37:54,044: INFO   pgadmin:    Registering blueprint module: <MiscModule 'misc'>
2024-10-24 15:37:56,478: INFO   pgadmin:    Registering blueprint module: <PreferencesModule 'preferences'>
2024-10-24 15:37:56,482: INFO   pgadmin:    Registering blueprint module: <PgAdminModule 'redirects'>
2024-10-24 15:37:56,484: INFO   pgadmin:    Registering blueprint module: <SettingsModule 'settings'>
2024-10-24 15:37:56,488: INFO   pgadmin:    Registering blueprint module: <ToolsModule 'tools'>
2024-10-24 15:37:58,266: DEBUG  pgadmin:    Config server mode: True
2024-10-24 15:37:58,267: DEBUG  pgadmin:    Not running under the desktop runtime, port: 5050
2024-10-24 15:37:59,647: ERROR  pgadmin:    'pinged'
Traceback (most recent call last):
  File "/venv/lib/python3.12/site-packages/flask/app.py", line 880, in full_dispatch_request
    rv = self.dispatch_request()
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/flask/app.py", line 865, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pgadmin4/pgadmin/misc/__init__.py", line 154, in cleanup
    driver.ping()
  File "/pgadmin4/pgadmin/utils/driver/__init__.py", line 34, in ping
    DriverRegistry._objects[type].gc_timeout()
  File "/pgadmin4/pgadmin/utils/driver/psycopg3/__init__.py", line 253, in gc_timeout
    if curr_time - sess_mgr['pinged'] >= session_idle_timeout:
                   ~~~~~~~~^^^^^^^^^^
KeyError: 'pinged' 

This is for the Crypt Key Missing part:

2024-10-24 15:39:53,980: INFO   pgadmin:    Released a lock.
2024-10-24 15:39:53,980: INFO   pgadmin:    Failed to connect to the database server(#1856) for connection (DB:postgres) with error message as below:connection failed: connection to server at "10.183.96.169", port 5444 failed: fe_sendauth: no password supplied
2024-10-24 15:39:53,980: ERROR  pgadmin:    'CONN:3930432'
Traceback (most recent call last):
  File "/venv/lib/python3.12/site-packages/flask/app.py", line 880, in full_dispatch_request
    rv = self.dispatch_request()
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/flask/app.py", line 865, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/flask/views.py", line 110, in view
    return current_app.ensure_sync(self.dispatch_request)(**kwargs)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pgadmin4/pgadmin/browser/utils.py", line 309, in dispatch_request
    return method(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/flask_login/utils.py", line 290, in decorated_view
    return current_app.ensure_sync(func)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pgadmin4/pgadmin/authenticate/mfa/utils.py", line 304, in inner
    return mfa_enabled(
           ^^^^^^^^^^^^
  File "/pgadmin4/pgadmin/authenticate/mfa/utils.py", line 169, in mfa_enabled
    return execute_if_enabled()
           ^^^^^^^^^^^^^^^^^^^^
  File "/pgadmin4/pgadmin/authenticate/mfa/utils.py", line 301, in if_else_func_inner
    return _func(first, second)
           ^^^^^^^^^^^^^^^^^^^^
  File "/pgadmin4/pgadmin/authenticate/mfa/utils.py", line 242, in mfa_session_authenticated
    return authenticated() if session.get('mfa_authenticated', False) is True \
           ^^^^^^^^^^^^^^^
  File "/pgadmin4/pgadmin/authenticate/mfa/utils.py", line 297, in execute_func
    return wrapped(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pgadmin4/pgadmin/user_login_check.py", line 22, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/pgadmin4/pgadmin/browser/server_groups/servers/__init__.py", line 994, in properties
    manager = driver.connection_manager(sid)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pgadmin4/pgadmin/utils/driver/psycopg3/__init__.py", line 117, in connection_manager
    manager._restore_connections()
  File "/pgadmin4/pgadmin/utils/driver/psycopg3/server_manager.py", line 393, in _restore_connections
    conn = self.connections[conn_id]
           ~~~~~~~~~~~~~~~~^^^^^^^^^
KeyError: 'CONN:3930432'

The pod then is restarted immediately and the user receives an error informing "Crypt Key Missing", because the pgadmin pod doesn't handle the SIGKILL gracefully and doesn't show the master password prompt again.

And the pod is restarted so fast, that pgadmin still shows the query editor, but you have to refresh the whole thing (F5) to make it work again. There's no autorefresh or any disconnection.

To Reproduce

Access a database through the pgadmin container and timeout it. We are trying to query a 90M row query and have 6Gb limit on the pod and 600m core. The query is very bad, yes: SELECT * from schema.table; but we're trying to reproduce the error that some users have reporterd recently from different dbs and clusters.

Expected behavior

I understand that calculating if a query is going to timeout is extremely complicated (if not impossible) so I would suggest either showing another error (such as query timeout or some other) instead of SIGKILL and killing the app. Because then the container would be killled, then reloaded. Plus the password prompt is not shown once is restarted, it shows the Crypt missing error but you have to manually refresh the tool.

If there is a setting we can use to handle this from a pgadmin perspective, please advise on how to do this (how timeouts are handled or wait time), if not, maybe handling the timout somehow to at least then show a message from the system, such as "Query timed out, session disconnected" and killing the session, not the whole thing.

If you query the db directly from the db, the query takes a long time, but it's doable.

Error message

"Crypt Key is missing" from pgadmin. From the logs, I've attached the messages on the previous sections.

Screenshots

There's no OOM issue, no threshold has been surpassed.

image

Here's our CPU usage for the pod: image

Here's the message: image

Additional context

We're deploying the app with helm into Openshift, pgadmin 4 image version is REL-8_12-21-gff838e43d. Please let me know if there's more info you need.

Thank you!

adityatoshniwal commented 4 weeks ago

Hi @andres-chavez-bi, We'll need to investigate more on why there was an exception (reason behind kill). I did spend some time to figure out but didn't find any reason. We could of course add a check to avoid killing of pgAdmin. Regarding the Crypt Key Missing - The reason it is asking because when a user logs in, the users password is used as the crypt key and is stored in-memory. But when the pgAdmin process got killed, the in-memory data is lot along with user logged in session. The user has to log in again to start a new user session. This can be avoided by simply fixing the process killing root cause which will be taken care before next release.

Thanks.