[BUG] Unknown index when querying data with fetch_size

deathjoin commented 2 years ago

What is the bug? When querying data with fetch_size api responses with Unknown index.

How can one reproduce the bug? Steps to reproduce the behavior:

Create test index
```
PUT test.data-1
```

Add some data

POST _bulk
{ "create": { "_index": "test.data-1", "_id": "1" } }
{ "type": "first", "message": "Hello" }
{ "create": { "_index": "test.data-1", "_id": "2" } }
{ "type": "first", "message": "Hello" }
{ "create": { "_index": "test.data-1", "_id": "3" } }
{ "type": "second", "message": "World" }

Run query

POST /_plugins/_sql
{
"query": "SELECT * FROM `test.*`",
"fetch_size": 100
}

Face the error

{
"error": {
"reason": "Error occurred in OpenSearch engine: Unknown index [`test.*`]",
"details": "org.opensearch.sql.legacy.rewriter.matchtoterm.VerificationException: Unknown index [`test.*`]\nFor more details, please send request for Json format to see the raw response from OpenSearch engine.",
"type": "VerificationException"
},
"status": 400
}

What is the expected behavior? Data returned

{
  "schema": [
    {
      "name": "message",
      "type": "text"
    },
    {
      "name": "type",
      "type": "text"
    }
  ],
  "datarows": [
    [
      "Hello",
      "first"
    ],
    [
      "Hello",
      "first"
    ],
    [
      "World",
      "second"
    ]
  ],
  "total": 3,
  "size": 3,
  "status": 200
}

What is your host/environment?

OS: Linux (Docker)
Version 2.3.0
Plugins: SQL 2.3.0.0

Do you have any additional context?

Query was working on OpenSearch 1.3.0 and we just updated to 2.3.0

Only way I found query working is to run without fetch_size and other fields like filter, but it is not an option for us 😢

POST /_plugins/_sql
{
  "query": "SELECT * FROM `test.*`"
}

docker logs error:

[2022-10-21T14:35:12,552][INFO ][o.o.s.l.p.RestSqlAction  ] [opensearch-0] [99a577f7-2e35-49bd-b998-d7562becc848] Incoming request /_plugins/_sql?pretty=true: ( SELECT * FROM table )
[2022-10-21T14:35:12,553][ERROR][o.o.s.l.p.RestSqlAction  ] [opensearch-0] 99a577f7-2e35-49bd-b998-d7562becc848 Client side error during query execution
org.opensearch.sql.legacy.rewriter.matchtoterm.VerificationException: Unknown index [`test.*`]
    at org.opensearch.sql.legacy.rewriter.matchtoterm.TermFieldRewriter.checkMappingCompatibility(TermFieldRewriter.java:242) ~[legacy-2.3.0.0.jar:?]
    at org.opensearch.sql.legacy.rewriter.matchtoterm.TermFieldRewriter.visit(TermFieldRewriter.java:71) ~[legacy-2.3.0.0.jar:?]
    at com.alibaba.druid.sql.dialect.mysql.ast.statement.MySqlSelectQueryBlock.accept0(MySqlSelectQueryBlock.java:255) ~[druid-1.0.15.jar:1.0.15]
    at com.alibaba.druid.sql.dialect.mysql.ast.statement.MySqlSelectQueryBlock.accept0(MySqlSelectQueryBlock.java:246) ~[druid-1.0.15.jar:1.0.15]
    at com.alibaba.druid.sql.ast.SQLObjectImpl.accept(SQLObjectImpl.java:40) ~[druid-1.0.15.jar:1.0.15]
    at com.alibaba.druid.sql.ast.SQLObjectImpl.acceptChild(SQLObjectImpl.java:62) ~[druid-1.0.15.jar:1.0.15]
    at com.alibaba.druid.sql.ast.statement.SQLSelect.accept0(SQLSelect.java:85) ~[druid-1.0.15.jar:1.0.15]
    at com.alibaba.druid.sql.ast.SQLObjectImpl.accept(SQLObjectImpl.java:40) ~[druid-1.0.15.jar:1.0.15]
    at com.alibaba.druid.sql.ast.SQLObjectImpl.acceptChild(SQLObjectImpl.java:62) ~[druid-1.0.15.jar:1.0.15]
    at com.alibaba.druid.sql.ast.expr.SQLQueryExpr.accept0(SQLQueryExpr.java:55) ~[druid-1.0.15.jar:1.0.15]
    at com.alibaba.druid.sql.ast.SQLObjectImpl.accept(SQLObjectImpl.java:40) ~[druid-1.0.15.jar:1.0.15]
    at org.opensearch.sql.legacy.query.OpenSearchActionFactory.create(OpenSearchActionFactory.java:114) ~[legacy-2.3.0.0.jar:?]
    at org.opensearch.sql.legacy.plugin.SearchDao.explain(SearchDao.java:52) ~[legacy-2.3.0.0.jar:?]
    at org.opensearch.sql.legacy.plugin.RestSqlAction.explainRequest(RestSqlAction.java:208) [legacy-2.3.0.0.jar:?]
    at org.opensearch.sql.legacy.plugin.RestSqlAction.lambda$prepareRequest$1(RestSqlAction.java:164) [legacy-2.3.0.0.jar:?]
    at org.opensearch.sql.opensearch.executor.Scheduler.lambda$withCurrentContext$0(Scheduler.java:30) [opensearch-2.3.0.0.jar:?]
    at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:747) [opensearch-2.3.0.jar:2.3.0]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
    at java.lang.Thread.run(Thread.java:833) [?:?]

Cluster settings almost untouched

{
  "persistent" : {
    "cluster" : {
      "max_shards_per_node" : "3000"
    },
    "plugins" : {
      "index_state_management" : {
        "metadata_migration" : {
          "status" : "1"
        },
        "template_migration" : {
          "control" : "-1"
        }
      }
    }
  },
  "transient" : { }
}

Cluster health

{
  "cluster_name" : "test",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 2,
  "discovered_master" : true,
  "discovered_cluster_manager" : true,
  "active_primary_shards" : 1995,
  "active_shards" : 2002,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 53,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 97.42092457420924
}

And shards

{
    "index" : "test.data-1",
    "shard" : "0",
    "prirep" : "p",
    "state" : "STARTED",
    "docs" : "3",
    "store" : "4.6kb",
    "ip" : "10.10.0.2",
    "node" : "opensearch-1"
  },
  {
    "index" : "test.data-1",
    "shard" : "0",
    "prirep" : "r",
    "state" : "STARTED",
    "docs" : "3",
    "store" : "4.6kb",
    "ip" : "10.10.0.3",
    "node" : "opensearch-0"
  },

deathjoin commented 1 year ago

Hey, can we get any updates on this? With 2.4.0 bug still persists.

Extra: it seems like the * within index name broke the query:

Fine

POST /_plugins/_sql
{
"query": "SELECT * FROM test.data-1",
"fetch_size": 10
}

Error

POST /_plugins/_sql
{
"query": "SELECT * FROM test.*",
"fetch_size": 10
}

It's kinda pain for us to use sql with latest updates because we splitting our indices by dates 😢

dai-chen commented 1 year ago

@deathjoin Sorry for the inconvenience. Just want to confirm, does your use case requires pagination (enabled by `fetch_size )? We're considering migrate pagination support to our engine V2. Could you elaborate your use case a little bit? Thanks!

deathjoin commented 1 year ago

Sure. Yes, pagination is required.

We use OpenSearch to store and continuously analyse many events from our product. Data extracted from OpenSearch using Python scripts via SQL API and then Pandas and other stuff involved to process it. Query can extract events for different periods of time like 1 day or 2 weeks and it sometimes leads to 100k or more documents per request, so we need pagination to get them all.

Typical index names are events-2022.11.20. We manage our indices using ISM policies. So to get data for time period we use either filter.range.timefield.lte/gte or WHERE timefield<"time" inside the query. Usually the query looks like SELECT fields FROM events-* WHERE timefield<"{date_time}" AND.... It's the same script running every week so we don't want to specify full index names like SELECT fields FROM events-2022.11.10,events-2022.11.11,events-2022.12.... WHERE ... which seems weird.

But queries like SELECT fields FROM events-* WHERE timefield<"{date_time}" AND... don't work now.

deathjoin commented 1 year ago

We're considering migrate pagination support to our engine V2

@dai-chen Do you have any plans for this feature yet? Any dates? Although it's seems more like a bug to me than lack of feature because it worked well on 1.x.x version :)

penghuo commented 1 year ago

We are in design phase of pagination. Tracking by #656.
if the use case is pull all the matched events. one workaournd is use LIMIT with the large number. e.g. SELECT * FROM test.* LIMIT 100000 . The PR https://github.com/opensearch-project/sql/pull/716 internally use scroll the pull all the matched docs. It is supported since OpenSearch 2.3.

deathjoin commented 1 year ago

Nice, thanks for the link. Subscribed.
Yes, it is the use case. Sad that the LIMIT 100000 only works for queries without aggregation. But it will help us for sure, thank you for the workaround!

rrlamichhane commented 1 year ago

My issue seems related:

https://stackoverflow.com/questions/75622817/how-to-make-fetch-size-work-in-opensearch-or-elasticsearch-sql-plug-in

rrlamichhane commented 1 year ago

I was able to get around this at least for my purpose by increasing the opendistro.query.size_limit

https://github.com/opendistro-for-elasticsearch/sql/blob/develop/docs/user/admin/settings.rst#opendistroquerysize_limit

rrlamichhane commented 1 year ago

@deathjoin What is the limit for queries with aggregation? My queries are returning maximum of 1000 size even when my opendistro.query.size_limit and opendistro.sql.cursor.fetch_size are set to 20000

Yury-Fridlyand commented 1 year ago

Please track implementation progress in #1759

opensearch-project / sql

[BUG] Unknown index when querying data with fetch_size #947