opensearch-project / sql

Query your data using familiar SQL or intuitive Piped Processing Language (PPL)
https://opensearch.org/docs/latest/search-plugins/sql/index/
Apache License 2.0
116 stars 134 forks source link

[FEATURE]Add `took` metadata information to PPL query response #2929

Open YANG-DB opened 1 month ago

YANG-DB commented 1 month ago

Is your feature request related to a problem? As part of the OpenSearch standard query DSL a took metadata field which counts the amount of time the query executed is returned in the response:

{
  "took": 12,
  "timed_out": false,
  "_shards": {
    "total": 333,
    "successful": 333,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 10000,
      "relation": "gte"
    },
    "max_score": 1,
    "hits": [
      {
        ....

A similar response is needed for PPL queries

What solution would you like? We would like to add this metadata information from the existing DSL response to the original PPL query and combine it with the PPL response as part of the returned metadata

Do you have any additional context?

salyh commented 3 weeks ago

@YANG-DB Should the new took field only be part of a response in jdbc format or also somehow part of a response in csv or raw format (and if so do we want to add it as a new column)?

YANG-DB commented 3 weeks ago

Hi I think we can add a new metadata field for this purpose - similar to how it's returning today in ppl OpenSearch response

salyh commented 3 weeks ago

I see the following approach to implement it:

Add the took info org.opensearch.sql.executor.ExecutionEngine.QueryResponse

The actual took value from the opensearch response is available in org.opensearch.sql.opensearch.response.OpenSearchResponse#OpenSearchResponse(org.opensearch.action.search.SearchResponse, org.opensearch.sql.opensearch.data.value.OpenSearchExprValueFactory, java.util.List<java.lang.String>) and we can extract it like

  /** Constructor of OpenSearchResponse. */
  public OpenSearchResponse(
      SearchResponse searchResponse,
      OpenSearchExprValueFactory exprValueFactory,
      List<String> includes) {
    this.hits = searchResponse.getHits();
    this.aggregations = searchResponse.getAggregations();
    this.exprValueFactory = exprValueFactory;
    this.includes = includes;
    this.took = searchResponse.getTook().getMillis();  //  <<<<------
  }

Then further expose it from org.opensearch.sql.opensearch.storage.scan.OpenSearchIndexScan like

  private void fetchNextBatch() {
    OpenSearchResponse response = client.search(request);
    took = response.getTook(); //  <<<<------
    if (!response.isEmpty()) {
      iterator = response.iterator();
    }
  }

Then extract it in org.opensearch.sql.opensearch.executor.OpenSearchExecutionEngine.execute()

            Long took = null;

            if (physicalPlan instanceof ProjectOperator) {
              PhysicalPlan input = ((ProjectOperator) physicalPlan).getInput();

              if (input instanceof TookAware) {
                took = ((TookAware) input).getTook();
              }
            }

            QueryResponse response =
                new QueryResponse(
                    physicalPlan.schema(), result, planSerializer.convertToCursor(plan), took);
            listener.onResponse(response);

Then use the took value from QueryResponse in org.opensearch.sql.protocol.response.format.SimpleJsonResponseFormatter

Two open questions remain so far:

1) took is only available for opensearch datasource (not for prometheus and not for spark) so there is this nasty instanceof thingy to make the distinction. Is there a better way to do it? 2) Is the general approach sufficient or are there easier ways to implement it?

See https://github.com/eliatra/sql/commit/d02920289cf58222134968a668da2a53254323e4 for a working prototype

If the prototype looks sufficient I am happy to open a PR

YANG-DB commented 1 week ago

@salyh thanks for the research 1) IMO we need to think of a more generic way to add metadata fields to the response - can you suggest something here ? 2) lets wait with the concrete PR until we can generalize the PPL response metadata - IMO you can also check how spark response handles such metadata as an example...

Thanks