opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.72k stars 1.8k forks source link

Allow DQL search or publish parsing library to generate query from DQL #9498

Open Galardolind opened 1 year ago

Galardolind commented 1 year ago

Is your feature request related to a problem? Please describe.

Searching using OpenSearch API requires a lot of documentation reading and requires to find samples ending often in copy pasting without really understanding why the query is formatted this way. The documentation is there but split into dozens of pages due to how complex query can be, this result in making simple query unnecessarily complex to write.

OpenSearch already have DQL that would simplify greatly these requests.

Exemple of a simple DQL query with the complex result in JSON:

system.last_assessment>"2022-08-21T23:15:13.570Z" and version>2 and system.status:P2
{
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "bool": {
            "filter": [
              {
                "bool": {
                  "should": [
                    {
                      "range": {
                        "system.last_assessment": {
                          "gt": "2022-08-21T23:15:13.570Z",
                          "time_zone": "Australia/Sydney"
                        }
                      }
                    }
                  ],
                  "minimum_should_match": 1
                }
              },
              {
                "bool": {
                  "filter": [
                    {
                      "bool": {
                        "should": [{ "range": { "version": { "gt": 2 } } }],
                        "minimum_should_match": 1
                      }
                    },
                    {
                      "bool": {
                        "should": [{ "match": { "system.status": "P2" } }],
                        "minimum_should_match": 1
                      }
                    }
                  ]
                }
              }
            ]
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  }
}

Describe the solution you'd like

Export existing DQL parser as an npm library to allow DQL integration to any frontend application.

Or

Having a new type of full-text search query_dsl that will take a DQL (or alike) query and does the parsing and search in the backend would simplify future integration with OpenSearch as the learning required to learn DQL is way smaller than the current JSON implementation.

Exemple:

GET movies/_search
{
 "query": {
    "query_dsl": {
      "query": "actors:*Gadot  and release_date>\"2023-01-01T00:00:00.000Z\" and imdb_score>5"
    }
  }
}

Describe alternatives you've considered

query_string covers part of it but does not allow any comparators.

Additional context

msfroh commented 1 year ago

Just wanted to call out that query_string does have syntax for range queries, though it's a little clunkier.

Your example query would be:

GET movies/_search
{
 "query": {
    "query_dsl": {
      "query": "actors:*Gadot AND release_date:{\"2023-01-01T00:00:00.000Z\" TO *} AND imdb_score:{5 TO *}"
    }
  }
}

In general, though, a DQL query builder would be pretty nice.

msfroh commented 1 year ago

In order to add this to OpenSearch itself, we would need a way to support the .peg grammar in Java in order to convert to OpenSearch QueryBuilder objects.

@austintlee suggested maybe separating the formal grammar from the Peggy implementation.

Basically:

formal grammar -> Peggy grammar (with suggestions) -> Javascript parser`

but also

formal grammar -> (some Java-friendly parser generator grammar, like ANTLR) -> Java parser

@Galardolind - would you be interested/willing to bring this issue up for discussion at OpenSearch Dashboards office hours? See https://www.meetup.com/opensearch/events/294620421/ for details.

Galardolind commented 1 year ago

Just wanted to call out that query_string does have syntax for range queries, though it's a little clunkier.

Would be great to have that in the query_string documentation, but agree too that DQL would simplify a lot more to the point that it is almost not necessary to learn something as it would be really close to natural language.

@Galardolind - would you be interested/willing to bring this issue up for discussion at OpenSearch Dashboards office hours? See https://www.meetup.com/opensearch/events/294620421/ for details.

Signed up, thanks for the suggestion 👍

ashwin-pc commented 1 year ago

A summary from the Dashboards office hours about this topic:

The reason for this request is because DQL is a simple language to use to search with, its desirable to have it accessible directly through an API as opposed to being only available through OSD's UI. There are a few ways to do this

  1. Include it as a part of the core api for OS so that both OSD and other applications can use it
  2. Add a new OS plugin that adds the DQL API so that other applications can use it. The downside here is that since DQL is a part of the minimal distribution of OSD, unless it can be guaranteed in the min distribution of OS too, we cant switch over to using the new API.
  3. Move DQL to a new npm module. Downside being that its another package to maintain and separately keep up to date
  4. Use PPL instead. This already has an API that does not need OSD to work.

Personally the easiest solution here is to see if PPL satisfies the requirements and if not look to see what the gap are. I also like the idea of moving DQL to core since it makes the backend easier to access with a simplified language that already support in Dashboards.

Would love to hear opinions from some of the maintainers of this repo about the solutions here. @nknize @dblock @andrross and even folks working on PPL @anirudha

msfroh commented 1 year ago

In the Search Relevance community meeting, @lukas-vlcek made a really good point that Peggy (what DQL uses for its grammar) excels at offering contextual hints on errors and autocomplete suggestions.

If that part, plus the part that converts from DQL into query DSL, were made available to application developers to use in their frontend (essentially option 3 from @ashwin-pc's comment), it could provide a good "power-user" experience.

Adding DQL support to the OpenSearch backend (either as a core feature, a module, or a plugin) might be nice, but we probably wouldn't want to remove support from OSD, because of those nice Peggy features. At that point, we would need to maintain the Peggy JS implemented in OSD and some Java implementation in the backend and make sure their grammars are kept in sync (hence @austintlee's suggestion above of deriving both from some canonical grammar).

tkaur-ds commented 3 months ago

Hi, are there any plans to implement this in near future?