neo4j-contrib / neo4j-streams

Neo4j Kafka Connector
https://neo4j.com/docs/kafka
Apache License 2.0
173 stars 71 forks source link

Neo4j Desktop queries return only after source events published #329

Closed ivangreene closed 3 years ago

ivangreene commented 4 years ago

Expected Behavior (Mandatory)

The 'completed after' time shown in the Neo4j Desktop browser should closely reflect the actual amount of time from pressing 'Return' to when the message appeared.

Actual Behavior (Mandatory)

Queries say they took between 1 ms and 32 ms, however, the browser did not return this until several seconds later (only after I could see the message in the topic). The queries return almost immediately after disabling streams.source.enabled

How to Reproduce the Problem

Steps (Mandatory)

  1. Configure (see below config)
  2. Use a tool (such as kt) to tail the topic: kt consume -topic node_actions -offsets all=newest-1:
  3. Run a simple statement, e.g CREATE (n:RandoLabel {name: 'foo'});
  4. Observe that the query does not return until the Kafka message appears in the topic, and the 'completed after' time is significantly shorter than the actual time it took to return

Video:

Video demonstrating this behavior: https://streamable.com/wkuy2o

Specifications (Mandatory)

Relevant configuration:

streams.source.enabled=true
streams.sink.enabled=false

streams.source.topic.nodes.node_actions=*
streams.source.topic.relationships.relationship_actions=*
streams.source.schema.polling.interval=1000000

kafka.zookeeper.connect=localhost:2181
kafka.bootstrap.servers=localhost:9092

Versions

moxious commented 4 years ago

This seems to have several interrelated factors at play. When you call streams.publish, this is not a guarantee that instantaneously this message will be sent out on kafka. Neo4j-streams uses a kafka client, which is subject to whatever defaults the client has (and any of your kafka.* settings). When you publish a message, what you actually do is add it to a buffer to be published later.

That buffer then accumulates up to a certain size before sending, and always sends by some timeout. Check the kafka docs for the "producer configuration settings", this is also referenced in the neo4j-streams manual.

As for how much time Neo4j Browser shows to execute a query, this issue would need to be taken up with the browser team, I'm just not sure how that's measured but I see what you're talking about.

You mentioned on slack this is a blocker to your adoption, but what is the blocker? The incorrect amount of time given by Desktop? A point I'd recommend following up on in your configuration is checking the producer configuration options for Kafka clients, and then tuning those to your use. You can for example lower the timeout or the buffer size (to deliver messages more quickly) but there's a tradeoff with throughput. If you do many thousands of messages/sec, you're better off with slightly longer timeouts and bigger buffers to make more efficient use of network. If you want to send 1 message every few seconds, you may end up waiting until the timeout threshold is hit before the buffer flushes to network. This isn't actually neo4j-streams I don't think, it's just how kafka clients work

ivangreene commented 4 years ago

@moxious re: being a blocker, I will need to do some further testing to see if this actually impacts the amount of time a query takes to execute, or whether it is simply (somehow) a visual delay of the Desktop browser.

The strangest thing that makes me feel like the Desktop return time and the Kafka plugin are related is that this immediately stops being visible when I disable the Kafka plugin. As soon as that is disabled, the Desktop return time matches the time it reports. But when Kafka is enabled, it just seems to wait for it to submit to Kafka to return the result in Desktop. Note that the delay between executing the query and sending to Kafka is not a problem for me, that makes a lot of sense and will perhaps need some tweaking, but the fact that it seems to hold up the return of the query.

jexp commented 4 years ago

Didn't we also have an async mode for the procedures? The procs are synchronous by default.

Ah there was a guy who changed the async behavior :)

https://github.com/neo4j-contrib/neo4j-streams/pull/161/files

So we should re-add that.

moxious commented 4 years ago

Ah, that's an excellent point - this was changed to be synchronous on purpose. The idea was that if it's not synchronous, you could async publish the message, and the message could fail at the kafka client layer and never go out. In this case, it would "fail silently" and there'd be no way for the user to know it, also you couldn't chain cypher queries if it's totally async.

I mean I think probably this is blocking for the network send, and it's not like the query is taking a lot of time to return. Question is if making it async is worth the downsides that we previously had, and worked around to get to the sync behavior.

ivangreene commented 4 years ago

I hadn't really gotten that far with our concept of it yet, but rolling the transaction back if a Kafka write fails would actually help out a lot (or at least the configurable option for that behavior if not the default). The question is what to do if one or more Kafka writes succeed and other(s) fail, I guess in that case we could transmit the start/rollback/commit of the transaction and ensure that a commit isn't implied in absence of one

moxious commented 4 years ago

Related ticket, which we're going to try to prioritize next week: https://github.com/neo4j-contrib/neo4j-streams/issues/349

conker84 commented 4 years ago

@ivangreene can you please try with the last release?

ivangreene commented 3 years ago

@conker84 back in this area now and I can confirm that the behavior is now as expected, the queries return in the expected time without waiting on submission