Open andreloeffelmann opened 1 week ago
To add an idea: I do not know if that's possible, but wouldn't it be good to: for each polling, query only for those CDC events within the database for which subscriptions are currently active. This would reduce the amount of transfered data massively
But: we have scheduled tasks that import data into our aura instance every day. The amount of data varies, but it is not uncommon that millions of nodes are (re-)inserted into the database.
Currently in a meeting with the team discussing this so will reply in full later. But wanted to check on this - are these millions of daily import events related to nodes which are also defined in your GraphQL type definitions, or node types which are unrelated to your GraphQL API? Thanks for the info and quick replies!
They are related to the API and exist within the GraphQL type definitions
Okay! Thank you very much for raising this issue and your continued commitment to improving this library with really well written issues, we do appreciate it!
We have discussed this and have come up with the following plan:
Do you see a way to re-enable something like
Neo4jGraphQLSubscriptionsDefaultEngine
again in API v7?
On this one, this is a hard no I'm afraid! This engine required a massive amount of crazy Cypher which we also couldn't implement in all cases - sometimes events would be happening in GraphQL but not being raised due to the difficulty of capturing them in Cypher. This CDC approach is so many times more reliable and makes it a lot easier to work on Cypher generation in the library.
By the way: to enable CDC in our situation feels really bad since it massively blows up the transaction log with CDC events no one needs.
On this one, I would strongly recommend looking into transaction log retention settings if you haven't already. I was just chatting about this with one of the kernel engineers earlier who gave me this info:
For something like GraphQL, it probs depends more on what they expect the average user session to be. If they using the subs for live-updates, then the retention is less of an issue as the client should be pushing out updates pretty much soon as they occur The longer retention is more for long running polling apps (like ETL pipelines) where if they go down, they can be restarted and expect to pick up from their last update For GraphQL, I imagine the reset is more from some initial queries and then getting deltas again
I hope all of the above is generally good news and helpful for you!
Describe the bug Since API v6 the
Neo4jGraphQLSubscriptionsDefaultEngine
is deprecated, so we replaced it with the only current available one and enabled CDC with mode=DIFF
at our Aura instance. This works totally fine as long as there are no massive changes within the database. But: we have scheduled tasks that import data into our aura instance every day. The amount of data varies, but it is not uncommon that millions of nodes are (re-)inserted into the database. This causes a LOT of CDC events which propagate back to our neo4j-graphQL server - which dies and shuts down. This even happens at my local machine where a lot of CPU and RAM is available, so this seems not to be a bottleneck here. I tried different configurations forpollTime
reaching from 100ms to 5000ms but this seems to have no effect on the problem here - the server dies either way.The thing is: we do not need the CDC events from the database. We only need subscriptions for change events happening within the neo4j-graphQL server - exactly what
Neo4jGraphQLSubscriptionsDefaultEngine
was doing. Since that was dropped, the only current working solution for us now is to disable subscriptions in total.On the other hand we have some applications which rely on the subscription-functionality - these apps do not work anymore.
We definitely do NOT want to stay on v5 since we aim to be up-to-date with the API at all time.
So, what are our options? Do you see a way to re-enable something like
Neo4jGraphQLSubscriptionsDefaultEngine
again in API v7? Can we somehow mimic the behaviour of this engine by ourselves and pass it toNeo4jGraphQL
?By the way: to enable CDC in our situation feels really bad since it massively blows up the transaction log with CDC events no one needs.