Open akuzin1 opened 1 year ago
@akuzin1 Thanks for the feature request! This recall an old request previously: https://github.com/opendistro-for-elasticsearch/sql/issues/1151. @acarbonetto is adding OpenSearch meta field support to our SQL engine. I'm thinking can we support this as part of meta field work.
Sounds good, that would be great to see. Is there an estimated timeline for when this feature would be released? Thank you
@acarbonetto @Yury-Fridlyand @MaxKsyunz ^^
This will be a dependency for https://github.com/opensearch-project/sql/issues/1441
The API on OpenSearch side is _routing
which we should consider using instead of _shard
(or both!)
Hi! Saw that the label was changed to "Priority-High" which is great to see, so wanted to check in and see if there is an estimated date for delivering of this feature? Thank you.
Hi! Saw that the label was changed to "Priority-High" which is great to see, so wanted to check in and see if there is an estimated date for delivering of this feature? Thank you.
Target release is 2.10 at the moment. Hoping to get this in a little sooner.
Proposal for setting routing
field on the search request. There's two ways forward that I'd like to propose. But first a little contexts:
The objective of the partition/routing shard is to include the routing ID in the SearchRequest builder.
Example:
new SearchRequest()
.indices(indexName.getIndexNames())
.routing(routingId.getIndexNames())
.scroll(scrollTimeout)
.source(sourceBuilder);
Getting the routing id(s) from the initial query into pushdown can take one of two obvious routes.
Syntax addition, includes a new json parameter only available in the V2 engine.
{
"query": "select _id, _index, _routing, int0 from calcs_routing limit 5",
"routing": "FURNITURE"
}
This includes an opensearch-similar API to target individual or lists of shards. The string will accept a comma-separated list of shard ID targets.
To get the request into OpenSearchRequest as part of pushdown requires that we create an AbstractPlan, LogicalPlan and PhysicalPlan operator like the Paginate and LogicalPaginate operators. This would allow us to pushdown a routingId string into the OpenSearchIndexScanQueryBuilder during execution. We would create Partition operators in a similar manner as the Pagination operators without much business logic. Reference to how paginate works: https://github.com/opensearch-project/sql/blob/main/docs/dev/Pagination-v2.md#unresolved-query-plan
Considerations: we may consider combining Paginate and Partition into a single set of operators, and call them something like PushDownConfiguration
and push all the parameters at once. This would allow us to scale by configuration values without needing to add more operators.
FROM table PARTITION (key, key, ...)
syntaxAs defined in https://dev.mysql.com/doc/refman/8.0/en/partitioning-selection.html, MySQL (and other SQL engines) allows a query to target a specific partition by id. This also allows for multiple ids in the PARTITION function.
{
"query": "select _id, _index, _routing, int0 from calcs_routing PARTITON(\"FUNITURE, OFFICE SPACE\") limit 5"
}
Update the parser syntax to allow for the PARTITION function in the FROM clause.
Ultimately, to get the routing ID(s) into the SearchRequest, we need to add the PARTITON keys to to IndexScan (which extends Table). Then in the IndexScanBuilder, send the routing ID(s) to the SearchRequest.
To accomplish this, we could push the partition keys into the Table (OpenSearchIndexScan) as partition keys during the analysis phase. On pushdown, the partition keys would be part of the IndexScan and could semi-easily push down to the OpenSearchRequest as part of the Index.
OpenSearchIndex already has the concept of an Index, and Routing Keys would have the same architecture:
Will there be a new projected release target for this feature, @acarbonetto? I see it was targeted for 2.11 and 2.12 is being released soon. Can this make 2.13, perhaps?
Is your feature request related to a problem?
In MySQL one can retrieve partition information about a table, which can later be used to target specific partitions during query execution.
The following is an example of a query that can be used to retrieve partition information of a specific table.
Next is an example of a query targeting a specific partition in a table.
SELECT * FROM table PARTITION (partitionName);
When applying this to opensearch, partitions could be treated as the equivalent of shards for our use case.
What solution would you like? It would be great to be able to treat shards in Opensearch as the equivalent to MySQL Partitions and be able to query individual shards.
What alternatives have you considered? We've considered generating splits based on hashing a key or tuple of keys and then modulo that against some fixed number of splits that we want to generate.
However, it doesn't seem like there is a hashing function like that available.
Therefore, the above mentioned solution would be a great behavior to add for all sql users, to more closely mimic the behavior and syntax of MySQL.