vaticle / typedb-driver-python

TypeDB Driver for Python
https://typedb.com
Apache License 2.0
67 stars 25 forks source link

When streaming, assume network latency is >=1ms #259

Closed alexjpwalker closed 2 years ago

alexjpwalker commented 2 years ago

What is the goal of this PR?

When streaming (e.g. match queries), we now assume the network latency is >=1ms, and have the server compensate accordingly. This sharply improves performance (up to 3x) when fetching answers from a server hosted at localhost.

What are the changes implemented in this PR?

When streaming from localhost with a per-answer performance profiler attached, it reveals that the answers come in "batches" of size 50 - the first 50 answers arrive in, say, 0.005s, then there is a 0.005s gap, then another 50 answers arrive, then there is another 0.005s gap, and so on. This indicates that there is a bug in the implementation of prefetch size - and sure enough, we've tracked it down. It manifests itself when connecting to localhost, and occurs due to the following logical flaw.

Answers are streamed in batches of size N from the server (where N = prefetch_size, default 50), to prevent the server doing unnecessary work in case the client does not end up consuming all answers. Once the client sees N answers, it should send a "CONTINUE" request to the server to continue streaming.

However, while the Nth answer is being sent to the client, and while the server is waiting to receive the CONTINUE request, the streaming should actually continue. If it doesn't, we end up with "wasted time" where the server is waiting and isn't sending anything. Thus, the server must predict to the best of its ability when the client will send the next CONTINUE. This is typically equal to the network latency.

However, when connecting to localhost, the network latency is 0 - while it is physically impossible for the client to respond to the server at the exact same moment that the server sends the Nth answer. localhost is an edge case that is currently unhandled.

To mitigate the problem, we now coerce the measured network latency to be at least 1ms.