opensearch-project / sql

Query your data using familiar SQL or intuitive Piped Processing Language (PPL)
https://opensearch.org/docs/latest/search-plugins/sql/index/
Apache License 2.0
120 stars 139 forks source link

[FEATURE] Add transport interface for SQL query API #1327

Open dai-chen opened 1 year ago

dai-chen commented 1 year ago

Overview

Background

Requirements

Design: Query Interface

This section discuss the following design options for user query interface.

Screenshot 2023-02-07 at 2 23 43 PM

Low Level Option: Native Transport API

First option is to just follow the convention and add transport API classes to OpenSearch common-utils module. This is also what current PPL transport API looks like.

TransportPPLQueryRequest request =
  new TransportPPLQueryRequest(
    query="source = index | where age > 30",
    isExplain=false,
    format="default");

// Same for DQL, DDL, DML
transportService.sendRequest(
  clusterService.state(),
  PPLQueryAction.NAME,
  request,
  new TransportResponseHandler<TransportPPLQueryResponse>() {
    public void handleResponse(TransportPPLQueryResponse response) {
      String results = response.getResult();

      // Parse JSON results and populate to specific data model class
      JSONObject json = ...
    }
  }
);

Mid Level Option: JDBC API over Transport

One option on higher level is to leverage JDBC API:

// Open connection
try (Connection conn = DriverManager.getConnection(
 "jdbc:opensearch://http://localhost:9200", "user1", "pass")) {

  // Query metadata
  DatabaseMetaData databaseMetaData = connection.getMetaData();
  try(ResultSet resultSet = databaseMetaData.getTables(null, null, null, new String[]{"TABLE"})){ 
    while(resultSet.next()) { 
      String tableName = resultSet.getString("TABLE_NAME");
      ...
    }
  }

  // DQL
  Statement stmt = conn.createStatement();
  List<ADResult> adResults = new ArrayList<>();
  ResultSet rs = stmt.executeQuery("SELECT ... FROM ...");
  while (rs.hasNext()) {
    ADResult result = new AdResult();
    result.setId(rs.getInt("ad_id");
    ...
    adResults.add(result);
  }

  // DDL & DML
  stmt.executeUpdate("CREATE TABLE test ...");
  stmt.executeUpdate("DELETE FROM test WHERE ...");
}

Mid Level Option: Transport API with Data Modeling

Alternatively, we can provide thin wrapper on top of transport API and take care of de-serialization work by some data model class.

// Same for DQL, DDL, DML
transportService.sendRequest(
  clusterService.state(),
  PPLQueryAction.NAME,
  request,
  new TransportResponseHandler<TransportPPLQueryResponse>() {
    public void handleResponse(TransportPPLQueryResponse response) {

      // Result is already mapped to SQL/PPL data model 
      QueryResult result = response.getResult();
      Schema schema = result.getSchema();
      for (Row row : result.getRows()) {
        ...
      }
    }
  }
);

High Level Option: Query DSL over Transport

Similarly as JOOQ, LINQ and other query DSL library, we can provide higher level query API than aforementioned options.

QueryDSL.init(transportService, clusterService ...)

// DQL
QueryDSL
   .sql()
   .from("index")
   .where("match(address, 'Seattle')")
   .select("name, age")
   .execute(format="csv")

QueryDSL
   .ppl()
   .search("index")
   .sort("name, age")
   .explain()

// DDL
QueryDSL
    .table("test").
    .column("age", "INT")
    .create();

// DML
QueryDSL
    .from("index")
    .where("age > 30")
    .delete();

Design: Permission Control on SQL Transport Action

As new transport action added, new action permission is required correspondingly. Here is an example from @penghuo that shows what's required from user side after SQL transport added:

PUT _plugins/_security/api/roles/sql_role
{
  "cluster_permissions": [
    **"cluster:admin/opensearch/sql"**
  ],
  "index_permissions": [{
    "index_patterns": [
      "sql-test-index"
    ],
    "allowed_actions": [
      "indices:data/read/search*",
      "indices:admin/mappings/get"
    ]
  }]
}

PUT _plugins/_security/api/rolesmapping/ppl_role
{
  "backend_roles" : [],
  "hosts" : [],
  "users" : ["test_user"]
}

Alternatively, user can create permission group for the new action permission as below:

PUT _plugins/_security/api/actiongroups/sql_access
{
  "allowed_actions": [
    "cluster:admin/opensearch/sql"
  ]
}

Questions: What' the impact on existing user who upgrade to latest version? Is it possible to avoid this and make user unaware?

Yury-Fridlyand commented 1 year ago

An idea for PPL use in Mid Level Option: JDBC API over Transport:

var properties = new Properties();
properties.setProperty("user", "looser");
properties.setProperty("password", "123");
properties.setProperty("lang", "PPL"); // <- The trick happens here
// Open connection
try (Connection conn = DriverManager.getConnection(
 "jdbc:opensearch://http://localhost:9200", properties)) {

  // Query metadata
  Statement stmt = conn.createStatement();

  // Query data
  List<ADResult> adResults = new ArrayList<>();
  ResultSet rs = stmt.executeQuery("search source=<index> | ... <rest PPL query>");
  while (rs.hasNext()) {
    ADResult result = new AdResult();
    result.setId(rs.getInt("ad_id");
    ...
    adResults.add(result);
  }
}
lezzago commented 1 year ago

Coming from the Alerting plugin perspective, I believe the option, High Level Option: Query DSL over Transport, seems the best for other plugins to utilize the SQL and PPL query support. This allows plugin to not require in depth knowledge of SQL and PPL and lets the experts (the SQL plugin) deal with the lower level implementations.

The JDBC driver option and the low level option don't seem good coming from a consuming plugin perspective since they require a fair amount of duplicated setup work as well as detailed knowledge on how to use SQL queries. This logic should be ideally abstracted away from this query work and let plugins focus on their plugin features. Also by abstracting it to a high level design, the SQL plugin can ensure the right practices are enforced when doing SQL/PPL queries.