neo4j / neo4j-python-driver

Neo4j Bolt driver for Python
https://neo4j.com/docs/api/python-driver/current/
Other
898 stars 186 forks source link

Datetime filter rounding bug on [UTC] property #877

Closed havardthom closed 1 year ago

havardthom commented 1 year ago

Bug Report

Current behaviour

Using neo4j.time.DateTime().to_native() as query param stores datetime property as 2022-12-13T00:00:00[UTC] Using datetime.datetime() as query param stores datetime property as 2022-12-13T00:00:00Z

Matching nodes MATCH (t:Test) WHERE t.datetime <= datetime($dtFilter) seems to behave differently for [UTC] datetime property vs Z datetime property. Seems like a rounding bug.

Expected behaviour

neo4j.time.DateTime().to_native() and datetime.datetime() query parameter should store datetime property in same format. See https://github.com/neo4j/neo4j-python-driver/issues/610

MATCH (t:Test) WHERE t.datetime <= datetime($dtFilter) should behave the same for [UTC] datetime property and Z datetime property.

Reproduced code:

import asyncio
from datetime import datetime, timezone

import neo4j

async def create_test_nodes_tx(tx):
    datetimes = [
        datetime.fromisoformat("2022-12-13T00:00:00.000000+00:00"),
        datetime.fromisoformat("2022-12-13T01:00:00.000000+00:00"),
        datetime.fromisoformat("2022-12-13T02:00:00.000000+00:00"),
        datetime.fromisoformat("2022-12-13T03:00:00.000000+00:00")
    ]

    for dt in datetimes:
        result = await tx.run("RETURN datetime($dt) AS datetime", dt=dt)
        record = await result.single(strict=True)

        dt_neo4j_native = record["datetime"].to_native()
        await tx.run("CREATE (t:Test) SET t.datetime = $dt", dt=dt_neo4j_native)
        # Property saved as datetime: "2022-12-13T00:00:00[UTC]"

        await tx.run("CREATE (t:Test) SET t.datetime = $dt", dt=dt)
        # Property saved as datetime: "2022-12-13T00:00:00Z"

async def get_test_nodes_filtered_tx(tx):
    dt_filters = [
        datetime.fromisoformat("2022-12-13T03:00:00.000000+00:00"), # <Record count=7>
        neo4j.time.DateTime(2022, 12, 13, 3, 0, 0, 0, tzinfo=timezone.utc).to_native(), # <Record count=7>
        datetime.fromisoformat("2022-12-13T03:00:00.000001+00:00"), # <Record count=8>
        neo4j.time.DateTime(2022, 12, 13, 3, 0, 0, 1000, tzinfo=timezone.utc).to_native(), # <Record count=8>
    ]
    query = """
        MATCH (t:Test)
        WHERE t.datetime <= datetime($dtFilter)
        RETURN count(*) AS count
    """
    for dt_filter in dt_filters:
        result = await tx.run(query, dtFilter=dt_filter)
        record = await result.single(strict=True)
        print(record)
        # Expecting count=8, "2022-12-13T03:00:00[UTC]" is not matched in some cases

async def main():
    uri = "neo4j://localhost:7687"
    driver = neo4j.AsyncGraphDatabase.driver(uri, auth=("neo4j", "pass"),)

    async with driver.session() as session:
        await session.execute_write(create_test_nodes_tx)
        await session.execute_read(get_test_nodes_filtered_tx)

asyncio.run(main())

My Environment

Python Version: 3.10.8 Driver Version: 5.1.0 Server Version and Edition: 4.4.6 Community

Possibly related: https://github.com/neo4j/neo4j-python-driver/issues/306 https://github.com/neo4j/neo4j-python-driver/pull/748

havardthom commented 1 year ago

Just found https://github.com/neo4j/neo4j-python-driver/issues/610 which clarifies the different datetime formats. Though the MATCH rounding issue still seems like a bug

robsdedude commented 1 year ago

Hello and thanks for reaching out.

I was puzzled like you as to what's going on here. After asking colleagues I was pointed to this piece of Cypher documentation. Which states

  • Comparison of temporal values:
    • [...]
    • Instant values that occur at the same point in time — but that have a different time zone — are not considered equal, and must therefore be ordered in some predictable way. Cypher prescribes that, after the primary order of point in time, instant values be ordered by effective time zone offset, from west (negative offset from UTC) to east (positive offset from UTC). This has the effect that times that represent the same point in time will be ordered with the time with the earliest local time first. If two instant values represent the same point in time, and have the same time zone offset, but a different named time zone (this is possible for DateTime only, since Time only has an offset), these values are not considered equal, and ordered by the time zone identifier, alphabetically, as its third ordering component. If the type, point in time, offset, and time zone name are all equal, then the values are equal, and any difference in order is impossible to observe.

Now that was surprising to me as well, but that's what Cypher does. So the driver can't do anything about it.

RETURN [
    datetime('2022-12-13T00:00:00.000000+00:00') < datetime('2022-12-13T00:00:00.000000[UTC]'),
    datetime('2022-12-13T00:00:00.000000+00:00') = datetime('2022-12-13T00:00:00.000000[UTC]'),
    datetime('2022-12-13T00:00:00.000000+00:00') > datetime('2022-12-13T00:00:00.000000[UTC]')
]

will return [true, false, false]. No rounding issue just comparison of temporal types in different time zones being defined in what I'd call an unintuitive way.

You could try to play around with .epochMillis in Cypher for your comparison, but that is limited to millisecond precision. Or you could use duration.between(a, b).

I'll close this as this is not a driver issue (other than the limitation of not being able to tell UTC apart from +00:00). Please feel free to keep commenting if you have further questions.

havardthom commented 1 year ago

I see, that is very unexpected, so one should be careful have consistent timezone types in the database. I solved my original problem by adding astimezone to the returned datetime from to_native() to consistently store datetimes with Z timezone. e.g. neo4j.time.DateTime().to_native().astimezone(datetime.timezone.utc)

Thank you for clarifying