Open jicohen opened 3 days ago
Relevant PR: https://github.com/opensearch-project/OpenSearch/pull/13720 CC: @rishabhmaurya @msfroh
@rishabhmaurya -- maybe we could define an implicit index setting that uses the NoOpDerivedFieldResolver
unless at least one derived field is in the mapping? (That is, we set a Boolean flag at the index level that flips to true
if/when a derived field is added to the mapping. Alternatively, if the iteration can only iterate over derived fields, it should be a no-op if no derived fields exist.)
What do you think?
@rishabh6788 Do we have something in OSB to capture/monitor this?
@jicohen thanks for reporting. @msfroh sounds like a plan
@rishabh6788 Do we have something in OSB to capture/monitor this?
At this point I don't think there is any workload that supports this use-case.
Describe the bug
When upgrading to 2.15.0 we experienced a significant search query performance regression in our test suite. After comparing profiling output from the same test in 2.14.0, it turned out to be the result of the new 'derived fields' feature. Specifically, it appears that for every field referenced in a query it iterates over every field defined in the index mappings. Until a few days ago each iteration performed an expensive string comparison.
https://github.com/opensearch-project/OpenSearch/blob/61dbcd0795c9bfe9b81e5762175414bc38bbcadf/server/src/main/java/org/opensearch/index/mapper/DefaultDerivedFieldResolver.java#L69
Our use case is a little unusual in that we have a large number (~10,000) of fields defined in our mappings and we issue large queries (~150 clauses). This issue added over a second of overhead to each query we performed (~7 ms per clause) in our testing. This slowdown was present despite not using the derived field type feature at all - each referenced field in the query would cause an iteration through the entire set of mapped fields. This issue was partially addressed on the main branch a few days ago as mentioned above - by short circuiting the expensive string comparison operation if the field mapping is not a derived field type. However, the iteration itself is still present. It should be possible to precompute the set of derived fields outside of the context of a particular query (e.g. add a method MapperService.derivedFieldTypes()) and iterate over only those and not every mapped field in the index.
As a workaround, for now we have disabled the 'derived fields' feature which replaces the DefaultDerivedFieldResolver with the NoOpDerivedFieldResolver.
Related component
Search:Performance
To Reproduce
Expected behavior
I would expect search query performance for the scenario described above (large number of defined field mappings and large number of query clauses in the search query) to perform comparably in the newer releases of OpenSearch as it did in 2.14.0.
Additional Details
Plugins