Open normanj-bitquill opened 3 weeks ago
This appears to be due to how the OpenSearch engine calculates the min and max for arrays values.
POST test3/_search
{
"size": 0,
"aggs": {
"y_min": {
"min": {
"field": "y"
}
}
}
}
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 5,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"y_min": {
"value": 1
}
}
}
POST test3/_search
{
"size": 0,
"aggs": {
"y_max": {
"max": {
"field": "y"
}
}
}
}
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 5,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"y_max": {
"value": 5
}
}
}
For reference, this is how PostgreSQL behaves:
SELECT * FROM with_array;
x | y
---+-------
1 | {1,2}
2 | {3,4}
3 | {1,5}
4 | {1,2}
(4 rows)
SELECT MIN(y) FROM with_array;
min
-------
{1,2}
(1 row)
SELECT MAX(y) FROM with_array;
max
-------
{3,4}
(1 row)
Thanks for rasing issue. OpenSearch/Lucene does not support ARRAY data type. Lucene allows adding multiple values for the same field name. for example. when calcualte min/max of field y, OpenSeach read all the value of y, and calcuate metrics, the result is min(y)=1, max(y)=5.
field,value
y,1
y,2
y,3
y,4
y,1
y,5
y,1
y,2
y,2
y,3
Related issue in opensearch-core. https://github.com/opensearch-project/OpenSearch/issues/16420
One solution is if array_field is used in aggregation, we should do post-processing, instead of rewrite as DSL aggregation query.
@penghuo I'm not sure the SQL plugin can easily know that the field is an array field. By default, the mapping will only contain information about the element types.
GET my_index/mapping
{
"properties": {
"id": {
"type": "long"
},
"array_field": {
"type": "long"
}
}
}
If the user adds something to the mapping, we could know that it is an array. https://trino.io/docs/current/connector/elasticsearch.html#array-types
Is there anything similar currently in OpenSearch?
@normanj-bitquill agree, https://github.com/opensearch-project/OpenSearch/issues/16420 discuss similar approach.
What is the bug? If MAX or MIN are applied to a field with array values, then the MAX or MIN of any element in any of the array values is returned.
Consider an index with the following data:
and this query:
or this query:
For MAX, the expectd result is
[3, 4]
and for MIN, the expected result is[1, 2]
. The comparison should be performed element by element and preserve the entire array value.How can one reproduce the bug? Steps to reproduce the behavior:
What is the expected behavior? Use array comparisons to compare the array values and return the MAX or MIN array.
What is your host/environment?
Do you have any screenshots? N/A
Do you have any additional context? Issue #1300 had a change recently merged in that allows array values to be used in query evaluation and in the result set.