change underlying kdb driver to pykx

zeddit commented 1 year ago

the project of javakdb seems not to be maintained. however, a project named pykx is rising and I tested it is far more faster than javakdb, because the kdb setup a new data gateway to serve pykx request. (I don't know if my test is correct)

I found it is possible to invoke python runtime in java. and I wonder if using the pykx could lead to a better performance in latency and scale.

thanks.

tuor713 commented 1 year ago

That sounds like a potentially quite messy integration to call out to Python (I presume on different OS that might require different Python installations etc). I wonder why performance would be so different - as far as I could see from the Java code it's pretty raw socket operations, which I would expect to not be too bad.

What kind of data are you pulling where performance is such a challenge?

zeddit commented 1 year ago

sorry for late reply. I am debugging the performance of javakdb. and some results are listed below, the results mainly relate to query small dataset. It can be found that the actual data related time(queried time, analysis time) is less than 1 second, however the total elapsed time is quite unaccepted. the dataset is about several MBs. So am I correct to get the conclusion that the performance issue is not the problem with javakdb, but the trino itself. I wonder if the network or trino cpu/memory leads to the problem. As a comparison, pykx has no such delays and very fast. could you recommend me with any other metrics or debugging tools to find more useful insight, thanks.

tuor713 commented 1 year ago

That looks extremely slow and completely unreasonable to what I have seen otherwise. Given analysis time & planning is low, no stats are being run.

Can you share how many rows and also the type of columns are in the table? Potentially the paging is creating too many small queries that create an issue.

CPU should be negligible - you can see that in the CPU seconds stats of the query whether Trino uses any substantial CPU. I would also think I/O is unlikely unless Trino and KDB are in vastly different networks with very low bandwidth in between.

zeddit commented 1 year ago

thanks for your reply. some more logs are pasted below. there is only about 10k rows and most rows is of float data type. the date field is surprisingly not datetime data type. I am using trino-sqlalchemy and it doesn't deal with the type correctly. I am not sure if this type cause the performance issue. I think I didn't migrate the javakdb correctly.

tuor713 commented 1 year ago

So based on all that I would think a) there is a horrible performance bug in JavaKDB 2.0 or b) the underlying KDB instance is periodically blocked and slow to respond. In the latest example even analysis time is 7sec - suggesting some sort of hang in the connection. Is there some underlying virtualization in the target KDB such as DNS load balancers?

zeddit commented 1 year ago

I will make a comparison between javakdb 1.0 and 2.0, and return my feedback here.

zeddit commented 1 year ago

I have tested, the results remain same when using javakdb 1.0. I wonder how the trino process the data. e.g. it pushs down the querys to the kdb, and kdb return the results which is part of the dataset or table back to trino. Or trino retrieves all data inside the table and calculate a subset of the rows and return it back.

I tested when using pykx read kdb directly into python environment, it needs only 3 seconds to finish. However, when using trino, it needs about 2minutes to finish. It's surprisingly high.

zeddit commented 1 year ago

there is no LB which would break connection for kdb. I have only one kdb instance. I noticed that a query task whose Physical Input Bytes/s could reach 60MB/s but Input Bytes/s could only get to 10MB/s. the later number is almost the bottleneck. I am not sure what Input Bytes/s means and who limit it.

zeddit commented 1 year ago

I have conduct some more experiments to find out the bottleneck, and it shows the bottleneck is at the trino-client side.

# test worker local performance limit.
create table memory.default.sf100_lineitem AS select * from tpch.sf100.lineitem; # it tells me about 1Gb/s Input Bytes;

# test worker to client side bottleneck.
select * from memory.default.sf1_lineitem # it's only about 12-18MB/s, which is far from my expectation. why it's so slow.

# test my backend db to worker.
create table memory.default.test AS select * from kdb.default.test; # it's about 90MB/s. which I think is slow but at now, it's bottlenecked by the client side.

I wonder if 90MB/s limit of the javakdb speed is correct. because when I am using pykx directly, it could reach about 200MB/s at the same kdb instance. great thanks.

tuor713 / trino-kdb

change underlying kdb driver to pykx #10