Closed thomaschristopherking closed 1 year ago
@thomaschristopherking I've managed to reproduce the example locally, and it contains some interesting findings!
I took your reproducible sample and made a small addition, adding a per-answer performance profiler:
start = time.time()
matches = tx.query().match(match_name)
+time_diffs = []
for m in matches:
name = m.map().get('fn').get_value()
+ time_diffs.append(time.time() - start)
diff = time.time() - start
+print(' '.join((str(d) for d in time_diffs)))
print(f'method_execution_time_seconds: {diff}')
What this reveals is that the answers come in "batches" of size 50 - the first 50 answers arrive in, say, 0.005s, then there is a 0.005s gap, then another 50 answers arrive, then there is another 0.005s gap, and so on. This indicates that there is a bug in the implementation of prefetch size - and sure enough, I've tracked it down. It manifests itself when connecting to localhost
, and occurs due to the following logical flaw.
Answers are streamed in batches of size N from the server (where N = prefetch_size
, default 50), to prevent the server doing unnecessary work in case the client does not end up consuming all answers. Once the client sees N answers, it should send a "CONTINUE" request to the server to continue streaming.
However, while the Nth answer is being sent to the client, and while the server is waiting to receive the CONTINUE request, the streaming should actually continue. If it doesn't, we end up with "wasted time" where the server is waiting and isn't sending anything. Thus, the server must predict to the best of its ability when the client will send the next CONTINUE. This is typically equal to the network latency.
However, when connecting to localhost
, the network latency is 0 - while it is physically impossible for the client to respond to the server at the exact same moment that the server sends the Nth answer. localhost
is an edge case that is currently unhandled.
To mitigate the problem, we should coerce the measured network latency to be at least 1ms.
(Interestingly, this is an issue that affects all clients, not just Client Python, and therefore the Python-specific issue must be a different issue; but with this change, we already speed every client up by over 2x when connecting to localhost
, which is a massive boost!)
Hi Alex, thanks for looking into this, it makes sense. If it's just for localhost then I guess it's not such a big deal (although I think our network latency is low in the cloud as well). But why does this not occur in Studio?
Why indeed - the issue affecting client-python must be a different issue. So I'm still going to keep this issue open as it remains unresolved; however, I'll push out the latency measurement change to all clients - a valuable boost for anyone developing on their local machine (in any language).
For what it's worth, just in case it helps with narrowing down the Python speed issues, I've noticed that it takes significantly more time to run a single complex query (e.g., starting with 5 Attributes, find all Entities that have those attributes (maybe 3 matches?), then find all Relations that have those Entities as Players (maybe 5 Relation matches?)) than it does to simply dump every thing in the database and post-process all the results in Python.
If I've got a batch of 800 or so queries (resetting read tx's after every 25 queries run) and I'm watching the log of each query being run in real-time, I've noticed that when it hangs on getting a result, it always seems to hang on the more complex queries rather than the short ones.
My point is, to a certain extent, I think the complexities of a specific search (or maybe a complex search thrown into a batch of searches?) contributes to the slow-down much more significantly than just the act of retrieving all the data.
Absolutely @TheDr1ver - the impact of a client-server connection's inefficiency will be minimal when the query itself is complex to execute on the server. match $x isa first_name
is one of the simplest queries possible in TypeDB.
Having said that, if your query is significantly slower than dumping the whole DB and post-processing results in Python, that may indicate a query planning issue and may be worth raising in https://github.com/vaticle/typedb. For instance, if retrieving all people with a filter like match $p isa person, has age $age; $age < 40;
were to perform significantly worse than filtering by age in Python as follows:
[(cm.get("p"), cm.get("a")) for cm in tx.query().match("$p isa person, has age $a;") if cm.get("a").get_value() < 40]
then that would certainly indicate a bug in the query planner.
@thomaschristopherking, @TheDr1ver : I believe we've found a conclusive answer to the inefficiency of Client Python (versus Clients Java and Node.js):
https://grpc.io/docs/guides/performance/#python
Streaming RPCs create extra threads for receiving and possibly sending the messages, which makes streaming RPCs much slower than unary RPCs in gRPC Python, unlike the other languages supported by gRPC.
This implies that the behaviour we are seeing (where Python is significantly slower at retrieving query answers than Studio, which uses Client Java) is expected in gRPC Python.
The gRPC team propose one potential fix, which is refactoring the code to use asyncio
. However, this is a significant undertaking, and with the development of typedb-client-rust
due to be completed in the fairly near future, we will be rewriting Client Python as a thin Python wrapper over an underlying Rust library - which should run optimally, and resolve this issue.
In the meantime, if optimality is essential for a product, we should recommend the usage of Client Java, Node.js or Julia, via a separate microservice that connects to a user's Python program if needs be.
Thank you @alexjpwalker that makes sense, another Python annoyance. Guessing this issue could be closed (or linked to the rust client implementation?). I think we can wait for the Rust client.
Absolutely - this issue will now form part of the TypeDB Client Rust Rewrite. Thanks again for flagging this issue up!
Closing this issue as it (should) largely dissolve when we rewrite the Python client to call the Rust client through FFI.
Description
When running a comparison on getting a single attribute using the python client, compared to Studio, Python is around five times slower.
Environment
Reproducible Steps
Steps to create the smallest reproducible scenario:
match $fn isa first_name; get $fn;
Expected Output
A similar execution time for both Python and Studio queries.
Actual Output
Python time: ~5.5s Studio time: ~1.4s to execute, and ~1.972s to output results