Full query result being loaded into RAM despite ON_DEMAND_RECORD_COUNT=1000

mac-chaffee commented 1 year ago

Describe the bug

When running something like select * from bigtable on a table with many rows, RAM usage of pgAdmin baloons to the size of the table size. This happens on the "query tool", which is supposed to fetch and display only 1000 rows at a time, which makes me think this is a bug.

To Reproduce

Steps to reproduce the behavior:

Create a table with lots of rows, say 2GB in size
Constantly record RAM usage of the pgAdmin server process with a high frequency
In the query tool, run select * from bigtable

Observe that RAM usage spiked about 2GB in size while the query was executing, then dropped back down.

Expected behavior

RAM usage should remain fairly low, about the size it would take to hold 1000 rows (give or take). You can see this expected behavior by doing the same query in plain psql after running \set FETCH_COUNT 1000.

Error message

If RAM usage was large enough to invoke the OOMKiller, user will see an error like "Error communicating with the pgAdmin server process".

Desktop (please complete the following information):

OS: Linux
Version: 6.17
Mode: Desktop
Package type: DEB

mac-chaffee commented 1 year ago

I'm thinking the issue may be related to the decision not to use server-side cursors here: https://github.com/pgadmin-org/pgadmin4/issues/947

My understanding is that both sync and async queries load the whole result into RAM. The only difference is that async queries don't block when they start doing that. So async queries give you the increased responsiveness, but not reduced RAM usage.

mac-chaffee commented 1 year ago

Also wanted to add: while testing I was noticing memory leaking. I'd run a large query then a small query, but the RAM usage of the large query would stick around, even after closing the query tool. But it wasn't super consistent

adityatoshniwal commented 1 year ago

Hi @mac-chaffee, Can please share how did check the RAM usage of the large query? Screenshots will help.

mac-chaffee commented 1 year ago

@adityatoshniwal I ran top -p $(pgrep -f pgAdmin4.py) -d 0.2 to measure RAM usage on Linux. On macOS, ActivityMonitor can be used if you configure a faster refresh rate (I think). Not sure about Windows

mac-chaffee commented 1 year ago

If running pgAdmin in the official container, just top -d 0.2 will work.

I also think I know why I saw memory-leaky issues. Without server-side cursors, every time you run a query, the full result is loaded into RAM, just in the background. This is controlled by StartRunningQuery. But since the transaction object is designed to out-live the request cycle, running a large query means you have a large transaction objects still occupying RAM even if the user has closed the query tool. So server-side cursors should fix that issue as well.

(Running a different query after running a large query does seem to free up the RAM usage of the large query, so I take back what I said earlier. I only see memory leak when closing the query tool.)

adityatoshniwal commented 1 year ago

@mac-chaffee From what I can see, server side cursors are already used. The RAM usage must be because of UI storing the rows. Need to dig more.

mac-chaffee commented 1 year ago

Here's a screenshot of the devtools after I run a large query (~6 million rows, ~2GB) with SERVER_MODE=False:

Note that the initial request (/sqleditor/query_tool/start/) takes 4.39 seconds but the reponse size is tiny. While that request is running, I see the RAM usage quickly increase to the full size of the table on one of the gunicorn workers:

Mem: 345007488K used, 1238222416K free, 25181496K shrd, 8308K buff, 199792396K cached
CPU:   2% usr   0% sys   0% nic  97% idle   0% io   0% irq   0% sirq
Load average: 3.54 3.78 3.76 5/4683 115
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
   99    12 root     S    2190m   0%  28   0% {gunicorn} /venv/bin/python3 /venv/bin/gunicorn --limit-request-line 8190 --timeout 86400 --bind [::]:80 -w 1 --threads 25 --access-
   12     1 root     S    27936   0%  28   0% {gunicorn} /venv/bin/python3 /venv/bin/gunicorn --limit-request-line 8190 --timeout 86400 --bind [::]:80 -w 1 --threads 25 --access-

After that first request returns, the content is fetched via /sqleditor/poll/. That returns right away (since the rows are all present on the pgAdmin server side) and the response size is only 141KB. I can confirm that the response was exactly 1000 rows.

At this point, top is still showing the exact same amount of RAM usage: ~2GB. Even closing the query tool entirely does not free up that RAM.

Server cursors are supported here: https://github.com/pgadmin-org/pgadmin4/blob/cf948603cfae854f458e074b4e1a68afdf561d24/web/pgadmin/utils/driver/psycopg2/connection.py#L630

But that function is never called with server_cursor=True. You'd need to pass that parameter here: https://github.com/pgadmin-org/pgadmin4/blob/cf948603cfae854f458e074b4e1a68afdf561d24/web/pgadmin/utils/driver/psycopg2/connection.py#L1024-L1040

Note the drawbacks mentioned in the other ticket I linked above, not sure if this is still applicable or not: https://www.postgresql.org/message-id/flat/CAFiP3vxaM9ixpmaM04JAkTT1ONwmnMPwUCbK%2BkzCN%2Br6F%2B7-ZA%40mail.gmail.com#CAFiP3vxaM9ixpmaM04JAkTT1ONwmnMPwUCbK+kzCN+r6F+7-ZA@mail.gmail.com

mac-chaffee commented 1 year ago

I tried just setting server_cursor=True and got this error:

2023-02-08 15:32:50,920: ERROR    pgadmin:    asynchronous connections cannot produce named cursors                                                                               
Traceback (most recent call last):                                                                                                                                                
  File "/pgadmin4/pgadmin/utils/driver/psycopg2/connection.py", line 694, in __cursor                                                                                             
    cur = self.conn.cursor(                                                                                                                                                       
psycopg2.ProgrammingError: asynchronous connections cannot produce named cursors                                                                                                  
2023-02-08 15:32:50,937: ERROR    pgadmin:    Failed to create cursor for psycopg2 connection with error message for the server#1:omop_covid:                                     
asynchronous connections cannot produce named cursors                                                                                                                             
2023-02-08 15:32:50,938: ERROR    pgadmin:    'NoneType' object has no attribute 'execute'                                                                                        
Traceback (most recent call last):                                                                                                                                                
  File "/venv/lib/python3.10/site-packages/flask/app.py", line 1517, in full_dispatch_request                                                                                     
    rv = self.dispatch_request()                                                                                                                                                  
  File "/venv/lib/python3.10/site-packages/flask/app.py", line 1503, in dispatch_request                                                                                          
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)                                                                                                  
  File "/venv/lib/python3.10/site-packages/flask_login/utils.py", line 290, in decorated_view                                                                                     
    return current_app.ensure_sync(func)(*args, **kwargs)                                                                                                                         
  File "/pgadmin4/pgadmin/tools/sqleditor/__init__.py", line 831, in start_query_tool                                                                                             
    return StartRunningQuery(blueprint, current_app.logger).execute(                                                                                                              
  File "/pgadmin4/pgadmin/tools/sqleditor/utils/start_running_query.py", line 88, in execute                                                                                      
    result, status = self.__execute_query(                                                                                                                                        
  File "/pgadmin4/pgadmin/tools/sqleditor/utils/start_running_query.py", line 139, in __execute_query                                                                             
    status, result = conn.execute_async(sql)                                                                                                                                      
  File "/pgadmin4/pgadmin/utils/driver/psycopg2/connection.py", line 1086, in execute_async                                                                                       
    cur.execute(query, params)                                                                                                                                                    
AttributeError: 'NoneType' object has no attribute 'execute'

Seems you can't mix async queries and server-side cursors together.

EDIT: If you also set self.async_ = 0, I get ERROR: can't use a named cursor outside of transactions.

mac-chaffee commented 1 year ago

It may also be good to take inspiration from psql itself. It pretty much always fetches the whole result synchronously except for when FETCH_COUNT is set or for a few other edge cases where it uses the equivalent of server-side cursors: https://github.com/postgres/postgres/blob/03023a2664f8950ad522385ff75ce004bc932a7c/src/bin/psql/common.c#L1108-L1122

pgadmin-org / pgadmin4

Full query result being loaded into RAM despite ON_DEMAND_RECORD_COUNT=1000 #5797