Open DumboJetEngine opened 11 months ago
Correcting myself on this statement :
I guess a similar feature must already exist in the SQL plugin for OS, since that problem would also be present there and had to be solved somehow
Turns out, the SQL plugin has the same (or similar) issues (check bullet #4): https://opensearch.org/docs/latest/search-plugins/sql/sql/complex/
Quoting the page :
In a WHERE statement, don’t combine trees that contain multiple indexes.
So, using the SQL plugin does not seem to be a viable solution at all.
And on a second review, using C# expressions does not seem like a good idea (too many limitations). Probably, preprocessing the query before sending it to O.S. would be better. And ask for it like that :
var response = openSearchClient.Search<Document>(s => s
.Index(documentsIndexName)
.Query(q => q
.Bool(bq => bq
.Filter(rq =>
{
return ( rq.Term(d => d.t1.First().Key, 1) || rq.Term(d => d.t2.First().Value, 7) ) &&
( rq.Term(d => d.t1.First().Key2, 5) || rq.Term(d => d.t2.First().Value, 9) || rq.Term(d => d.id, 7) );
})
)
)
.RelationalMode(true)
);
But I am not 100% sure how one could exclude specific nested queries from being "relationalized". Perhaps with an extra parameter that could be specified on each nested query.
I have started experimenting on an implementation for this. I am not sure what will be the end result, but it is interesting for me already...
I have a working implementation of this, that also adds some extra tiny features. It appears to work for the queries I tested it on, but I will need to test it some more.
One thing that I had to do was to add a way of cloning IQuery
/QueryBase
instances, and I have created an extension method that handles each type of query individually, but I would guess that the best thing would be to make IQuery
expose a .Clone()
method. This, however, will make the code diverge more from the original ElasticSearch code (but maybe that is OK?) and it will also not keep the changes contained in a small portion of the code, but they will be all over the place.
If you want me to add the .Clone()
method, let me know.
I am planning to add some tests as well, but I have various issues with Visual Studio not running the tests.
There were also some "challenges" with preserving the scoring behavior with Filter
vs Must
, but I think this is of smaller importance when you try to port SQL queries to O.S. .
This is how it works now:
var queryChangeLogStringBuilder = new StringBuilder(4 * 1024);
var queryResponse = openSearchClient.Search<Book>(s => s
.Index(documentsIndexName)
.Query(q => q
.Bool(bq => bq
.Filter(rq =>
{
var query = (
(rq.Term(d => d.Tags.First().Name, "cats") || rq.Term(d => d.Price, 11))
&&
rq.Term(d => d.Tags.First().Name, "dogs")
)
.Name("cats-and-dogs")
.Strict(true)
.TransformToRelationalEquivalent<Book>(queryChangeLogStringBuilder);
return query;
})
)
)
);
[OpenSearchType(IdProperty = "Id")]
public class Book
{
public int Id { get; set; }
public string? Title { get; set; }
public int Pages { get; set; }
public decimal Price { get; set; }
[Nested]
public Tag[] Tags { get; set; } = new Tag[0];
[Nested]
public Meta[] Meta { get; set; } = new Meta[0];
}
public class Tag
{
public string? Name { get; set; }
public int Weight { get; set; }
}
public class Meta
{
public string? Name { get; set; }
public string? Value { get; set; }
}
@Xtansia Is it OK if I send, in one commit and one PR, the implementation for this issue ( #446 ), #380, and #375 ?
@DumboJetEngine for ease of review and change tracking purpose, it would best that they're separate PRs please
@Xtansia OK. Done. If you can review and merge the two small PRs I have just sent, so that I can resolve any conflict for the 3rd, it would be great.
Is your feature request related to a problem?
While trying to translate some SQL relational queries into OpenSearch queries using nested documents and the
OpenSearch.Client
, I got into MANY problems due to the fact that I had to gather all nested fields under a single nested query. I wrote a question here, but I am not sure if is clear enough.Essentially, when the SQL query is complex and involves many JOINed tables, the nested queries in OpenSearch become too complex to write by hand.
For example, say you have this SQL query:
To translate that into an OpenSearch query, after you create the nested documents for
t1
&t2
, you have to gather allt1
filters under a single nested query, all thet2
filters under another nested query, and all non-nested field filters outside the nested queries.By doing some manual boolean transformations one could shape the query like that:
This gets very easily out of hand for more complex queries, and even more for dynamic queries where you cannot statically analyze and simplify the queries beforehand.
What solution would you like?
A way to preprocess queries, in order to gather all nested fields under a single nested query would be great, since this cannot be done manually most of the times. In the example above, it would be great if the mentioned query "simplification" and gathering of all:
t1
filters under a single nested queryt2
filters under another single nested queryd
filters outside all nested queries ...could be performed by the library. Also, when using property attributes/annotations for the nested object, the library could infer by itself when to use a nested query and when not, without having the programmer specify anything extra. I guess it is a complex feature, but at least I hope I have explained well what the problem is. If I am missing something (because I am quite new to OpenSearch), let me know.The way I am imagining the implementation is something like this, which uses C# Expression trees in the
.Relational()
method:I guess a similar feature must already exist in the SQL plugin for OS, since that problem would also be present there and had to be solved somehow, although the SQL plugin has some restrictions from what I can recall.
What alternatives have you considered?
inner_hits
from each nested query in my query to eliminate some returned results : it was not feasible since I had many nested queries and nested documents queried in a single convoluted query.OpenSearch.Client
: this is not possible since my queries are built dynamically, I have more than 2 nested objects, etc.