opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.78k stars 1.82k forks source link

[BUG] The doc_values true not working like expected #5770

Open alexander-schranz opened 1 year ago

alexander-schranz commented 1 year ago

Describe the bug

I currently working on an search abstraction written in PHP and stumble over a problem.

I have different values which are set to index: false because they should not be indexed or has any relevant for search queries. Still there are values which are should be filterable.

When I try to filter by a range query opensearch does return:

1) Schranz\Search\SEAL\Adapter\Opensearch\Tests\OpensearchConnectionTest::testGreaterThanCondition OpenSearch\Common\Exceptions\BadRequest400Exception: {"error":{"root_cause":[{"type":"query_shard_exception","reason":"failed to create query: Cannot search on field [rating] since it is not indexed.","index":"test_complex","index_uuid":"uAiYjfldRgyNusMyla0bUg"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"test_complex","node":"7B60B2FRTx-VXkljtM9Q9w","reason":{"type":"query_shard_exception","reason":"failed to create query: Cannot search on field [rating] since it is not indexed.","index":"test_complex","index_uuid":"uAiYjfldRgyNusMyla0bUg","caused_by":{"type":"illegal_argument_exception","reason":"Cannot search on field [rating] since it is not indexed."}}}]},"status":400}

The same schema work like expected under Elasticsearch, so this error is unexpected. In both cases the doc_values are explicit set and I use the same value for all fields which are filterable the doc_values is set to true. While this is not even required for elasticsearch as ther is doc_value: true the default value, I read in the opensearch doc that doc_values: false is the default but even setting it to true does not make a non-indexed field filterable like expected.

To Reproduce

Steps to reproduce the behavior:

  1. Start Opensearch 2.4.1
2. Create Schema ```json { "uuid": { "type": "keyword", "index": false, "doc_values": true }, "title": { "type": "text", "index": true }, "header": { "type": "object", "properties": { "image": { "type": "object", "properties": { "media": { "type": "integer", "index": false, "doc_values": false } } }, "video": { "type": "object", "properties": { "media": { "type": "text", "index": false } } } } }, "article": { "type": "text", "index": true }, "blocks": { "type": "object", "properties": { "text": { "type": "object", "properties": { "title": { "type": "text", "index": true }, "description": { "type": "text", "index": true }, "media": { "type": "integer", "index": false, "doc_values": false } } }, "embed": { "type": "object", "properties": { "title": { "type": "text", "index": true }, "media": { "type": "text", "index": false } } } } }, "footer": { "type": "object", "properties": { "title": { "type": "text", "index": true } } }, "created": { "type": "date", "index": true, "doc_values": true }, "commentsCount": { "type": "integer", "index": false, "doc_values": true }, "rating": { "type": "float", "index": false, "doc_values": true }, "comments": { "type": "object", "properties": { "email": { "type": "text", "index": false }, "text": { "type": "text", "index": true } } }, "tags": { "type": "text", "index": true, "fields": { "raw": { "type": "keyword" } } }, "categoryIds": { "type": "integer", "index": false, "doc_values": true } } ```
3. Index documents ```json [ { "uuid": "23b30f01-d8fd-4dca-b36a-4710e360a965", "title": "New Blog", "header": { "type": "image", "media": 1 }, "article": "

New Subtitle<\/h2>

A html field with some content<\/p><\/article>", "blocks": [ { "type": "text", "title": "Titel", "description": "

Description<\/p>", "media": [ 3, 4 ] }, { "type": "text", "title": "Titel 2", "description": "

Description 2<\/p>" }, { "type": "embed", "title": "Video", "media": "https:\/\/www.youtube.com\/watch?v=iYM2zFP3Zn0" } ], "footer": { "title": "New Footer" }, "created": "2022-01-24T12:00:00+01:00", "commentsCount": 2, "rating": 3.5, "comments": [ { "email": "admin.nonesearchablefield@localhost", "text": "Awesome blog!" }, { "email": "example.nonesearchablefield@localhost", "text": "Like this blog!" } ], "tags": [ "Tech", "UI" ], "categoryIds": [ 1, 2 ] }, { "uuid": "79848403-c1a1-4420-bcc2-06ed537e0d4d", "title": "Other Blog", "header": { "type": "video", "media": "https:\/\/www.youtube.com\/watch?v=iYM2zFP3Zn0" }, "article": "

Other Subtitle<\/h2>

A html field with some content<\/p><\/article>", "footer": { "title": "Other Footer" }, "created": "2022-12-26T12:00:00+01:00", "commentsCount": 0, "rating": 2.5, "comments": [], "tags": [ "UI", "UX" ], "categoryIds": [ 2, 3 ] } ] ```

4. Query rating field ```json { "query": { "bool": { "must": [ { "range": { "rating": { "gt": 2.5 } } } ] } } } ```

Expected behavior

It should be possible to filter non-indexed fields when they have doc_values: true.

Also doc_values: false seems to be the default but doc_values: true is never returned by the schema mapping itself, or the docs are lying and doc_values: false is not the default.

Based on the docs Opensearch has a different doc_values default value:

Expected is that when setting doc_values: true a field should be filterable via range gte and other cases like it is in elasticsearch.

Plugins

No plugins:

Screenshots

Bildschirmfoto 2023-01-09 um 22 59 01

Host/Environment (please complete the following information):

docker-compose on MAC

Additional context

PHP Reproducer:

git clone git@github.com:alexander-schranz/schranz-search.git
cd schranz-search

git checkout origin/feature/reproducer-opensearch
cd packages/seal-opensearch-adapter

docker compose up

composer install

composer test -- --filter="GreaterThanCondition"

As reference in packages/seal-elasticsearch-adapter working and running Elasticsearch example can be found running on same schema the same tests.

dblock commented 1 year ago

@alexander-schranz We forked at 7.10, are those changes introduced since then in ES? Let's compare 7.10 documentation and make anything new a feature request? Appreciate your help!

alexander-schranz commented 1 year ago

@dblock if you are refering to the changed default value, doc_values default value is even for Elasticsearch 7.10 but also for 5.0 and 2.4 true.

Can you confirm that opensearch did any changes to the doc_values? For me it even looks like that this config value does now nothing in opensearch? I also checked the commit histories and not even could find something which reference why doc_values was changed to false or even removed in opensearch. Maybe its also just false documented that doc_values is still true, and there is just an bug with it.

I did not try out Opensearch 1 yet, should I check that also?

dblock commented 1 year ago

I can't find any changes to doc_values. Can you confirm that the same scheme works in Elasticsearch 7.10.2 (and works/doesn't work in OpenSearch 1.0)?

alexander-schranz commented 1 year ago

I did check and it fails on 7.10.2 and 7.17 also. I'm know trying to find out what was changed inside elasticsearch itself which did make fields behave like documente with only doc_values: true but index: false still filterable.

So my basic usecase ist that you should not be able to search for a specific term but still be able to filter by it. But did not yet found a solution 🤔

I can't find any changes to doc_values

Can you confirm so that the documented default value on https://opensearch.org/docs/2.0/opensearch/supported-field-types/numeric/ is false and shold be doc_values default true? And doc_values is true like in elasticsearch 7.10.2.

alexander-schranz commented 1 year ago

But yeah from a Bug pont of view I think we can can close it. Still if somebody have a hint how to achieve nonesearchable but filterable fields in opensearch let me know.

alexander-schranz commented 1 year ago

I also could find the elasticsearch change of 8.1: The related issue is this one: https://github.com/elastic/elasticsearch/issues/52728 which has the related PR linked in it.