opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.84k stars 1.83k forks source link

[BUG] TaskResourceTrackingService consuming more CPU than expected #16635

Open andrross opened 1 week ago

andrross commented 1 week ago

Describe the bug

I profiled an OpenSearch server running the term query operation from the big5 OSB workload. This query is very fast, so the intent was to find any overhead not related to doing the actual work of searching indexes that could be optimized. The surprising finding is that TaskResourceTrackingService takes a little over 7% of the total CPU cycles. A big chunk of that work is simply marshaling the TaskResourceInfo object to and from a JSON string (in getTaskResourceUsageFromThreadContext and writeTaskResourceUsage()).

Related component

Search:Performance

To Reproduce

This overhead will happen on any search, though for more expensive searches it may be less noticeable as the CPU will be dominated by other work.

Expected behavior

Assuming that the TaskResourceInfo is just being serialized for machine-to-machine communication, then it should use an efficient binary serialization to avoid the XContent/Jackson/JSON overhead in the hot path on searches.

Additional Details

Here is a zoomed-in snippet of a profile showing getTaskResourceUsageFromThreadContext():

image
ansjcy commented 1 week ago

This code path was introduced in OpenSearch as a way to capture "query-level" resource usages. I'll look into this for a more efficient way to send the usages data.