Open dzane17 opened 1 week ago
Probably introduced in https://github.com/opensearch-project/OpenSearch/pull/8386
Similar issue https://github.com/opensearch-project/OpenSearch/issues/13343
The current
stat can be decremented in two places, on phase end (success):
https://github.com/opensearch-project/OpenSearch/blob/e68838819710d7040cf2b591590285f1b86f0da0/server/src/main/java/org/opensearch/action/search/SearchRequestStats.java#L82
and on phase failure: https://github.com/opensearch-project/OpenSearch/blob/e68838819710d7040cf2b591590285f1b86f0da0/server/src/main/java/org/opensearch/action/search/SearchRequestStats.java#L89
I looked through and didn't find any cases where they might both be called.
However, I did notice that current
stat is stored in an unsynchronized map:
https://github.com/opensearch-project/OpenSearch/blob/e68838819710d7040cf2b591590285f1b86f0da0/server/src/main/java/org/opensearch/index/search/stats/SearchStats.java#L120
And the key is just the phase name, which are common strings: https://github.com/opensearch-project/OpenSearch/blob/e68838819710d7040cf2b591590285f1b86f0da0/server/src/main/java/org/opensearch/action/search/SearchPhaseName.java#L19-L25
But digging in to the increments and decrements they use atomic operations so that shouldn't be an issue...
Describe the bug
We have identified a bug where search statistics are being incorrectly reported as negative, which is causing failures when accessing the
nodes/stats
API. This issue is specifically related to the handling of negative values in theSearchStats
class. The exception occurs when the API attempts to serialize the statistics, causing ajava.lang.IllegalStateException
due to the use ofVLong
encoding, which does not support negative values.Example Response from nodes/stats API:
Exception in
elasticsearch.log
:Root Cause
The issue occurs due to the use of
StreamOutput.writeVLong()
in theSearchStats
class, which does not support negative long values. The stack trace shows that when a negative value is encountered, it triggers the exception inelasticsearch.log
shown above.The problematic line in the code is:
Source: SearchStats.java#L89
Related component
Search
To Reproduce
This bug does not occur consistently across all domains or search queries. We are still investigating the specific conditions or types of searches that trigger this issue.
Expected behavior
Search stats should not be negative.
Additional Details
Plugins Please list all plugins currently enabled.
Screenshots If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
Additional context Add any other context about the problem here.