psycopg / psycopg2

PostgreSQL database adapter for the Python programming language
https://www.psycopg.org/
Other
3.35k stars 506 forks source link

cursor.fetchone() hangs script with no errors #1715

Closed fridary closed 3 months ago

fridary commented 3 months ago

I have an linux service that runs Thread() every 12 second and do queries to database. Suddenly I notice that after 1-5 days my python script freezes and nothing happens. So I put a lot of breakpoints and found out that script hangs on these lines and nothing happens, no errors:

cursor.execute(f"SELECT 1 FROM _blocks WHERE number={block_number}")
_exists_block = cursor.fetchone()

Service is located in /etc/systemd/system/:

Environment=PYTHONUNBUFFERED=1
Environment="PATH=/home/fridary/miniconda3/envs/eth/bin"
ExecStart=/home/fridary/miniconda3/envs/eth/bin/python eth_newHeads.py
StandardOutput=append:/home/fridary/python/ethereum/lib/logs/service_output.log
StandardError=append:/home/fridary/python/ethereum/lib/logs/service_error.log

eth_newHeads.py:

while True:
    journal.write(f"[Block] {block_number} socket received")

    # ...

    thread = Thread(target=on_message, args=(block_number))
    thread.start()

def on_message(block_number):
    with psycopg2.connect(**db_config_psql) as conn:
        with conn.cursor() as cursor:

            journal.write(f" Starting SELECT 1 FROM _blocks")

            try:
                cursor.execute(f"SELECT 1 FROM _blocks WHERE number={block_number}")
                _exists_block = cursor.fetchone()
            except:
                journal.write(f" Error SELECT 1 FROM _blocks")
                journal.write(traceback.format_exc())
                print(traceback.format_exc())
                return

            # this reaches only if no hanging
            journal.write(f" Success SELECT 1 FROM _blocks")

When hangs, I see this journalctl --unit=eth_newHeads -n 100 --no-pager:

...
july 25 16:54:01 fridary-pc python[337626]: [Block] 20383990 socket received
july 25 16:54:01 fridary-pc python[337626]:  Starting SELECT 1 FROM _blocks
july 25 16:54:13 fridary-pc python[337626]: [Block] 20383991 socket received
july 25 16:54:13 fridary-pc python[337626]:  Starting SELECT 1 FROM _blocks
july 25 16:54:25 fridary-pc python[337626]: [Block] 20383992 socket received
july 25 16:54:25 fridary-pc python[337626]:  Starting SELECT 1 FROM _blocks
july 25 16:54:37 fridary-pc python[337626]: [Block] 20383993 socket received
july 25 16:54:37 fridary-pc python[337626]:  Starting SELECT 1 FROM _blocks

As we see, it hangs. Nowhere any errors. Last logs service_output.log and service_error.log I can see before script is hanged.

Now, if I do sudo systemctl restart postgresql@12-main, it will begin to work and Success SELECT 1 FROM _blocks is reached.

I could think obvious the problem is in psql, but terminal commands work:

$ sudo -i -u postgres psql
psql (16.2 (Ubuntu 16.2-1.pgdg20.04+1), server 12.16 (Ubuntu 12.16-0ubuntu0.20.04.1))
Type "help" for help.

postgres=# \c wallet_hunter;
psql (16.2 (Ubuntu 16.2-1.pgdg20.04+1), server 12.16 (Ubuntu 12.16-0ubuntu0.20.04.1))
You are now connected to database "wallet_hunter" as user "postgres".
wallet_hunter=# SELECT 1 FROM _blocks WHERE number=20184014;
 ?column?
----------
        1
(1 row)
wallet_hunter=# SELECT version();
                                                                version
----------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 12.16 (Ubuntu 12.16-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
(1 row)

I need to say that there are psql v12 and v16 installed on Ubuntu both, but I use only v12. And wallet_hunter on v12.

And the last at the moment of hang sudo systemctl status postgresql@12-main:

Captura de pantalla 2024-07-25 a las 10 03 36
dvarrazzo commented 3 months ago

Hello,

this depends on too many things, especially what the queries do on the server. If you are using different connections per thread there is just no shared structure (lock, variables) across connections, so I doubt that the lock happens in the client.

You need a professional consultancy to look into it, it's not a bug in the driver.

fridary commented 3 months ago

@dvarrazzo but there are absolutely no errors and it freezes. How can I catch an error? Maybe there is timeout option in fetchone()? I changed connection to be declared once outside the Thread and it did not help. I also added this check if connection is not closed and reopen:

# declaring conn outside Thread()
conn = psycopg2.connect(**utils.db_config_psql, **utils.keepalive_kwargs)

def on_message(block_number):

    try:
        with conn.cursor() as cursor:
            pass
    except psycopg2.InterfaceError as e: # connection already closed
        conn = psycopg2.connect(**utils.db_config_psql, **utils.keepalive_kwargs)

    with conn.cursor() as cursor:
        journal.write(f" Starting SELECT 1 FROM _blocks")

        # ...