opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.51k stars 1.74k forks source link

[Joins] Join Query DSL #15450

Open harshavamsi opened 3 weeks ago

harshavamsi commented 3 weeks ago

Is your feature request related to a problem? Please describe

Coming from #15185 , we want to introduce the join DSL format that will be used to construct the join query. It will make use of the existing QueryBuilders within OpenSearch to parse the left and right queries. We will add new logic to SearchSourceBuilder to support the new join field in the query DSL.

Describe the solution you'd like

The join field will be parsed by a new JoinBuilder in OpenSearch that will take in the following:

Full query DSL

{  
  "query": {  
    "bool": {  
      "filter": [  
        {  
          "range": {  
            "@timestamp": {  
              "gte": "now-1h"  
            }  
          }  
        },  
        {  
          "match": {  
            "message": "error"  
          }  
        }  
      ]  
    }  
  },  
  "fields": ["instance_id", "status_code"],  
  "join": {  
    "right_query": {  
        "index": "instance_details",   
        "query": {  
          "range": {  
            "created_at": {  
              "gte": "now-1y"  
            }  
          }  
        },  
        "fields": ["instance_id", "region"]  
    },  
    "type": "inner",   
    "algorithm": "hash_join", // optional  
    "condition": {  
        "left_field": "instance_id",  
        "right_field": "instance_id",  
        "comparator": "="  
    },  
    "fields": ["region", "status_code"],  
    "aggs": {  
      "by_region": {  
        "terms": {  
          "field": "region"  
        },  
        "aggs": {  
          "by_status_code": {  
            "terms": {  
              "field": "status_code"  
            },  
            "aggs": {  
              "status_code_count": {  
                "value_count": {  
                  "field": "status_code"  
                }  
              }  
            }  
          }  
        }  
      }  
    }  
  }  
}

Related component

Search:Query Capabilities

Describe alternatives you've considered

No response

Additional context

No response

smacrakis commented 3 weeks ago

Small comment: why do we speak of the left and right queries? In SQL, the left and right objects are normally called "tables". The result sets to join may be defined by table names or by subqueries. The usual equivalent of "table" in OpenSearch is "index", but that term is so overloaded that it's best avoided. Wouldn't it be clearest for people who are familiar with SQL to use the standard SQL terminology, namely tables?