spring-projects / spring-data-neo4j

Provide support to increase developer productivity in Java when using Neo4j. Uses familiar Spring concepts such as a template classes for core API usage and lightweight repository style data access.
http://spring.io/projects/spring-data-neo4j
Apache License 2.0
828 stars 620 forks source link

Changing to elementId from Id causes a more than 6x increase in query traffic #2957

Open gliwka opened 1 week ago

gliwka commented 1 week ago

We've updated our Spring Data Neo4j version from 7.0.0 to the latest version 7.3.4. After the update we've noticed network issues, connection read timeouts and a CPU-saturated AuraDB Enterprise instance on regular repository.findById() query on an entity with many nested relationships.

Further investigation showed, that we suddenly had more than 6x outgoing traffic to our Neo4j instance. A few megabytes per second suddenly turned into 60-80MB/s. That's outgoing traffic (queries) - not the data we fetch, which is much lower.

Enabling logging on the driver level revealed, that the sent queries are indeed multiples larger. The issue seems to be with the queries SDN generates once it wants to fetch a graph with a structure possibly containing cycles, thus SDN falling back on the cascading / N+1 query generation for each level as documented here.

In our case those queries often contain thousands of IDs. Those IDs before (7.0.0) were 64-Bit Longs, occupying only ~8 Bytes - i.e. "2023663". With the introduction of the elementId (i.e. "4:2da1ca6d-a09f-41be-8dff-230d14780c2b:2023663"), which is a string with ~45 bytes on average, the queries are significantly larger since the hundreds of ids are now the large part of the payload. (no. of bytes is just a naive approximation, the bolt protocol level might handle those differently, but the same principle applies).

The bad thing is that all ids for all nodes in our database start with the prefix "4:2da1ca6d-a09f-41be-8dff-230d14780c2b" - so this information is highly redundant in our query.

We could furthermore verify this by changing the Dialect from "Dialect.NEO4J_5" to "Dialect.NEO4J_4" to trigger the logic here switching back to the Id function:

https://github.com/spring-projects/spring-data-neo4j/blob/fef71a289fd00aad94e9febbbe4aaabf3abe2f45/src/main/java/org/springframework/data/neo4j/core/mapping/SpringDataCypherDsl.java#L39-L43

This has the desired effect, our bandwidth usage is now more than 6x lower and the database and network connection is happy again.

Here's one of the more problematic queries: https://gist.github.com/gliwka/96415da55e69429f9c4a5033c54ab124

I understand, that id() is deprecated, however the elementId() in combination with the N+1 query model for potentially circular data models (ours certainly is an DAG, but that's another issue - https://github.com/spring-projects/spring-data-neo4j/issues/2840) and the resulting transfer of hundreds or thousands of elementsIds, that are now multiple times larger, is a bad combination, that does not bring any additional value to us.

For the meantime we've downgraded to the Neo4j 4 dialect and are investigating the use of projections to fetch the data in a different way to be able to upgrade back to Neo4j v5, however we're thankful for any other pointers.

michael-simons commented 1 week ago

Hello. We will answer here in more detail soon, but right now just a couple of things

Please be aware that your log will be swamped now with depreciation warnings. You might wanna adapt the level for those notifications https://docs.spring.io/spring-data/neo4j/reference/appendix/logging.html

gliwka commented 1 week ago

Looking forward to it, let me know if any more details would be helpful 😄

We're already using external generated UUIDs on our entities as identifiers.

The id/elementId discussion is just part of the generated query by SDN for those cascading queries whenever we try to load the entity using the finder methods. They're not part of our business logic or entities - there we use exclusively UUIDs. However using our UUIDs for the same purposes would pose the same challenge.

Staying on the older dialect is not a long term solution, just one that allows us to update our Spring Boot Version and also get the fix for cascading updates=false on the Relationships to solve another problem we have with SDN + Optimistic Locking (currently we have a lot of projections in our codebase for the write path to stop cascading updates).

We're trying to migrate to projections for the read path and/or custom queries get those generated queries under control, if no other solution should become available to switch the dialect back to 5 as soon as possible, however we would appreciate another solution.