tensorbase / tensorbase

TensorBase is a new big data warehousing with modern efforts.
https://tensorbase.io/
Apache License 2.0
1.44k stars 116 forks source link

SQL Select on string field without limit clause hangs query #86

Open chrisfw opened 3 years ago

chrisfw commented 3 years ago

Select statement on a string field with a set limit returns quickly, but running it without the limit set hangs the query.

TensorBase :) select Region from sales limit 33

SELECT Region
FROM sales
LIMIT 33

Query id: 871f1999-4423-452c-bb86-553ec865abe8

┌─Region────────────────────────────┐
│                                   │
│                                   │
│                                   │
│                                   │
│                                   │
│                                   │
│                                   │
│                                   │
│                                   │
│                                   │
│                                   │
│                                   │
│                                   │
│                                   │
│                                   │
│                                   │
│                                   │
│                                   │
│                                   │
│                                   │
│                                   │
│                                   │
│                                   │
│                                   │
│                                   │
│                                   │
│                                   │
│                                   │
│                                   │
│ Australia and Oceania             │
│ Asia                              │
│ Sub-Saharan Africa                │
│ Central America and the Caribbean │
└───────────────────────────────────┘

33 rows in set. Elapsed: 0.003 sec.

TensorBase :) select Region from sales

SELECT Region
FROM sales

Query id: 910bf12c-db86-4933-a376-4705097a6dbf

^Z
[3]+  Stopped                 clickhouse-client --host 192.168.145.130 --port 9528
root@ubuntu:~/tensorbase/client# kill %1
jinmingjian commented 3 years ago

@chrisfw could you share some reproducible case/dataset? There are some bugs in result conversions which I have fixed recently (should in the last release's binary).

chrisfw commented 3 years ago

@jinmingjian , I updated from the git repo and ran the server using cargo run --bin server -- -c /root/tensorbase/conf/base.conf and still reproduced the issue. I have attached a zip file containing the table schema and dataset for reproducing the problem. Please let me know if you need any additional information.

issue86.zip

jinmingjian commented 3 years ago

@chrisfw thanks, let's do a check

jinmingjian commented 3 years ago

@chrisfw comfirmed. In this case, TB has returned the right result. It seems a CH client bug or mixed at least. I proposed a workaround:

  1. first, run a relative big query (but not too big...) like: select Region from sales limit 100000

  2. then run no-limit query, Region from sales: select Region from sales

jinmingjian commented 3 years ago

@chrisfw you can enable a debug log output flag if you want to understand more of TB, to start server like this:

bash enable_dbg_log=1 cargo run --bin server -- -c path_to_your/base.conf

chrisfw commented 3 years ago

@jinmingjian Thanks for confirming. I did try the suggested workaround, but no luck as it times out.

Exception on client: Code: 209. DB::NetException: Timeout exceeded while reading from socket (192.168.145.130:9528): while receiving packet from 192.168.145.130:9528

I am just testing for you and don't actually require the functionality, so the workaround isn't essential for me. Thanks for the instructions for debug logging, that is helpful. I see as you mentioned that TB appears to complete the query execution and return the results to CH client.

jinmingjian commented 3 years ago

@chrisfw very thanks for feedbacking at weekend. It works on my CH client:

issue_86

The client still only shows the first 10000 rows.

I guess this cause by too large return. But Arrow's RecordBatch has not provided a split like functionality. So do splitting is heavy. I will leave this bug for a while to have a look if we can have a better solution.

chrisfw commented 3 years ago

@jinmingjian , no problem. Sounds good. I will continue exploratory TB testing and report any issues I find. Going forward, please let me know if there are any specific areas you would like me to target for testing.