neo4j / neo4j-python-driver

Neo4j Bolt driver for Python
https://neo4j.com/docs/api/python-driver/current/
Other
882 stars 187 forks source link

Explicit AsyncTransaction randomly ignores specified timeout #1028

Closed v-v-d closed 3 months ago

v-v-d commented 4 months ago

Bug Report

First of all, I want to thank you for the amazing Neo4j driver for Python with full asyncio support. It's really very cool and performs wonderfully! Thank you!

Now to the point. While using the driver, I've noticed that if explicit transactions are used and a timeout argument is passed at their initialization, the driver does not always respect this timeout. This leads to transactions containing queries that run for minutes when a timeout of just a few seconds is specified.

I have prepared a simple pytest test that confirms the above. You can run it via docker-compose up --build neo4j_test.zip

Could you please tell me if this is a bug or expected behavior? Or maybe I'm doing something wrong?

This is a snippet of the test:

from typing import LiteralString, cast

import neo4j
import pytest
from _pytest.fixtures import SubRequest
from neo4j import AsyncGraphDatabase

@pytest.fixture()
async def driver() -> neo4j.AsyncDriver:
    async with AsyncGraphDatabase.driver(
        uri="neo4j://neo4j:7687",
        auth=("neo4j", "top-secret"),
    ) as driver:
        yield driver
        await driver.close()

@pytest.fixture()
async def session(driver: neo4j.AsyncDriver) -> neo4j.AsyncSession:
    session = driver.session(database="neo4j")
    yield session
    await session.close()

@pytest.fixture()
async def trx(request: SubRequest, session: neo4j.AsyncSession) -> neo4j.AsyncTransaction:
    trx = await session.begin_transaction(timeout=request.param)
    yield trx
    await trx.close()

@pytest.mark.parametrize("trx", [0.1, 0.3, 0.6, 0.9], indirect=True)
async def test_trx_timeout(trx: neo4j.AsyncTransaction) -> None:
    with pytest.raises(neo4j.exceptions.ClientError):
        await trx.run(query=cast(LiteralString, "CALL apoc.util.sleep(1000)"))
        await trx.commit()

and this is one of the outputs:

============================= test session starts ==============================
platform linux -- Python 3.11.6, pytest-8.1.1, pluggy-1.4.0
rootdir: /app/code
plugins: asyncio-0.23.6
asyncio: mode=Mode.AUTO
collected 4 items

app/test_neo4j.py .F.F                                                   [100%]

=================================== FAILURES ===================================
____________________________ test_trx_timeout[0.3] _____________________________

trx = <neo4j._async.work.transaction.AsyncTransaction object at 0xffff860d4a50>

    @pytest.mark.parametrize("trx", [0.1, 0.3, 0.6, 0.9], indirect=True)
    async def test_trx_timeout(trx: neo4j.AsyncTransaction) -> None:
>       with pytest.raises(neo4j.exceptions.ClientError):
E       Failed: DID NOT RAISE <class 'neo4j.exceptions.ClientError'>

app/test_neo4j.py:35: Failed
____________________________ test_trx_timeout[0.9] _____________________________

trx = <neo4j._async.work.transaction.AsyncTransaction object at 0xffff860e9c50>

    @pytest.mark.parametrize("trx", [0.1, 0.3, 0.6, 0.9], indirect=True)
    async def test_trx_timeout(trx: neo4j.AsyncTransaction) -> None:
>       with pytest.raises(neo4j.exceptions.ClientError):
E       Failed: DID NOT RAISE <class 'neo4j.exceptions.ClientError'>

app/test_neo4j.py:35: Failed
=========================== short test summary info ============================
FAILED app/test_neo4j.py::test_trx_timeout[0.3] - Failed: DID NOT RAISE <clas...
FAILED app/test_neo4j.py::test_trx_timeout[0.9] - Failed: DID NOT RAISE <clas...
========================= 2 failed, 2 passed in 3.63s ==========================

My Environment

Python Version: 3.11 (python3.11-slim docker image) Driver Version: 5.18.0 Neo4j Bolt driver for Python Server Version and Edition: 5.18.0 community (from neo4j:5.18.0-community docker image) Operating System: debian bookworm (from python3.11-slim docker image)

robsdedude commented 3 months ago

Hi and thanks for reaching out with this very detailed report :smile:

This is, while maybe surprising, expected behavior. If you were to enable debug logging, you'd see that the driver will send the timeout to the server as you specified it. So it's the server's responsibility to enforce the transaction timeout. The driver doesn't process it any further. That being said, I remember being surprised by this behavior myself at some point and asked our DBMS kernel developers what's going on. (Implementation detail follows, i.e., this could change in the future) There is a watcher thread in the DBMS that checks every not and then:tm: (I think currently 2s) whether there are any transactions that have exceeded their configured timeout. If so, they will be terminated.

TL;DR: the timeout cannot be arbitrarily small and the server might cut transactions some slack before terminating them.

I'll close the issue as there is nothing to do here. But I'll forward this issue to team kernel so they know that at least 2 people were surprised by this behavior. Let's see if that leads to something. If you have further questions, please feel free to keep commenting.

v-v-d commented 3 months ago

Hi! Got it, thank you very much for the detailed response! What do you think, could the use of the asyncio.timeout context manager be dangerous? Is it possible that on the client side a transaction is interrupted due to asyncio.TimeoutError, but on the Neo4j side, it remains open, thus generating anomaly load on Neo4j?

robsdedude commented 3 months ago

Several thoughts: