opendistro-for-elasticsearch / sql

🔍 Open Distro SQL Plugin
https://opendistro.github.io/for-elasticsearch/features/SQL%20Support.html
Apache License 2.0
621 stars 186 forks source link

JDBC Driver returns only 200 rows #1013

Closed kenjori closed 3 years ago

kenjori commented 3 years ago

I'm trying to use the JDBC Driver 1.12.0.0 with DBeaver and Oracle Data Integrator.

In both tools, a select statement only returns 200 rows with no apparent way to get all the matching records. DBeaver auto pagination also fails. I found a similar issue #244 but did not understand how to solve it when fetch_size cannot be supplied with every request.

anirudha commented 3 years ago

200 is a default limit. Use a limit clause to increase returned row count

kenjori commented 3 years ago

limit clause works around Dbeaver problem. Thank you.

Is there any way to set this on connection level when integrating with BI tools? Adding limit clause to all generated complex queries may be tricky

anirudha commented 3 years ago

We expect client drivers to use some specific logic when integrating with tools. Eg. Tableau, powerBI etc.

Some of them use cursors for deep paging..

We could also explore adding a setting to remove the default limit, but that would impact performance for bad queries.

What do you think? Can you post a help section on connecting to db weaver, or oracle tools. This will help others. Thanks

kenjori commented 3 years ago

The tools I use don't have their own drivers for ES integration and I need to rely on this one.

Adding a connection level setting to remove the limit would help a great deal. Bad queries are the owners' responsibility but imposing a hardcoded default limit is punishing for everyone.

I can create a help section on connecting to mentioned tools as soon as I can get them work as I intend to.

Thank you,

penghuo commented 3 years ago

Please also try opendistro.query.size_limit. You can change the value to any value not greater than the max result window value in index level (10000 by default)

kenjori commented 3 years ago

As far as I understand, a statement level parameter is needed to workaround the default limit. Statements are built and executed by the tool itself and I have very little to no influence there.

I need a connection level parameter that I can set within the JDBC connection URL

penghuo commented 3 years ago

As far as I understand, a statement level parameter is needed to workaround the default limit. Statements are built and executed by the tool itself and I have very little to no influence there.

I need a connection level parameter that I can set within the JDBC connection URL

The opendistro.query.size_limit = 10000, will make the query return max 10000 results by default. The setting is a global setting and you don't need to config for each query.

Could you help us lean in which use case the tools required 10000+ data? One use cases in my mind could be "dump the index".

kenjori commented 3 years ago

The question is, can I use opendistro.query.size_limit in the JDBC connection URL? I tried anyway but without success.

10K+ data is mostly needed for ETL flows where you need to move data around.

chloe-zh commented 3 years ago

Hi @kenjori , the settings can be update through the ES host, take es host https://elasticsearch:9200 as an example:

curl -XPUT https://elasticsearch:9200/_opendistro/_sql/settings -H 'Content-Type: application/json' -d '
{
  "transient": {
    "opendistro.query.size_limit": 10000
  }
}'

Reference: https://github.com/opendistro-for-elasticsearch/sql/blob/develop/docs/user/admin/settings.rst#opendistro-query-size-limit

dai-chen commented 3 years ago

Closing. Feel free to reopen if anything else. Thanks!

CuriousQA commented 3 years ago

I have the similar problem for the # of records exceeding 10000. In my case, the Elastic search holds the details of the transaction timestamp and the service that a customer makes. As there are many customers and the data is available from 2017, the transaction numbers are higher. So can you please suggest a way to query all the records in that index. Basically on a high level to get the total number of transactions made by a particular customer in a given time period or overall.

I'm working this in PowerBI, I can only see the number of transactions made are around 200 only(default number). Kindly provide us your valuable suggestion.

sanketrs commented 3 years ago

Hi, I resolved this issue with following command that you will have to execute on Dev Tools of Kibana

1. This will update the maximum number of records that your can retrieve from your elastic index. Here for example, I have set up limit to 2,50,000

PUT /your_index_name/_settings
{
  "index": {
    "max_result_window":"250000"
  }
}

2. This will update the number of records you can retrieve with a SELECT sql query on an elastic index Here for example, I have set up limit to 1,40,000

PUT _cluster/settings
{
  "transient" : {
    "opendistro.query.size_limit" : 140000
  }
}