Open Pigueiras2 opened 2 weeks ago
Search Meetup Triage: @jainankitk / @sgup432 Do you have some context on this?
@Pigueiras2 Did you also try specify total bucket counts (reduce than defaults) as well?
Did you also try specify total bucket counts (reduce than defaults) as well?
Do you mean changing search.max_buckets
? I tried setting it to 10k, but I didn’t notice any difference. According to this comment, that limit might not be reached because it is only taken into account in the reduce phase. If the aggregation is small enough and OpenSearch can compute it, I see an error about my query failing because it hit the maximum number of buckets. I also found this issue, which made me think there is a breaker to protect against such queries, but I haven’t seen it being triggered in my cluster.
@Pigueiras / @Pigueiras2 Can you capture and provide a couple of histograms of the heap? Ideally, Search Backpressure should have caught it, unless there is an allocation being made elsewhere.
@kkhatua
I send the query to my cluster at Sat Aug 31 11:45:57 PM CEST 2024 and one of the nodes crashed at 23:48:55,614 and the other one at 23:49:00,778.
This is what heap reported by _node/stats
looked like (in addition of the reported cpu + backpressure stats). Also last panel reports the memory consumed by the search tasks (what is reported by the tasks API)
Logs of one of the datanodes before crashing (I see zero entries about "o.o.s.b.SearchBackpressureService" in the cluster logs):
…
[2024-08-31T23:48:50,959][DEBUG][o.o.n.r.t.AverageMemoryUsageTracker] [osbbackup101-monit-backup1_data2] Recording memory usage: 99%
[2024-08-31T23:48:50,959][DEBUG][o.o.i.b.HierarchyCircuitBreakerService] [osbbackup101-monit-backup1_data2] [parent] Data too large, data for [indices:admin/seq_no/retention_lease_background_sync[r]] would be [33031019058/30.7gb], which is larger than the limit of [31621696716/29.4gb], real usage: [33031018456/30.7gb], new bytes reserved: [602/602b], usages [request=90146464/85.9mb, fielddata=103263/100.8kb, in_flight_requests=31500/30.7kb]
[2024-08-31T23:48:50,959][DEBUG][o.o.i.b.HierarchyCircuitBreakerService] [osbbackup101-monit-backup1_data2] [parent] Data too large, data for [cluster:monitor/nodes/info[n]] would be [33031028310/30.7gb], which is larger than the limit of [31621696716/29.4gb], real usage: [33031018456/30.7gb], new bytes reserved: [9854/9.6kb], usages [request=90146464/85.9mb, fielddata=103263/100.8kb, in_flight_requests=31500/30.7kb]
[2024-08-31T23:48:50,960][DEBUG][o.o.i.b.HierarchyCircuitBreakerService] [osbbackup101-monit-backup1_data2] [parent] Data too large, data for [indices:admin/seq_no/retention_lease_background_sync[r]] would be [33031019082/30.7gb], which is larger than the limit of [31621696716/29.4gb], real usage: [33031018456/30.7gb], new bytes reserved: [626/626b], usages [request=90146464/85.9mb, fielddata=103263/100.8kb, in_flight_requests=31500/30.7kb]
[2024-08-31T23:48:50,959][DEBUG][o.o.n.r.t.AverageCpuUsageTracker] [osbbackup101-monit-backup1_data2] Recording cpu usage: 38%
[2024-08-31T23:48:50,959][DEBUG][o.o.i.b.HierarchyCircuitBreakerService] [osbbackup101-monit-backup1_data2] [parent] Data too large, data for [cluster:monitor/nodes/stats[n]] would be [33031025000/30.7gb], which is larger than the limit of [31621696716/29.4gb], real usage: [33031018456/30.7gb], new bytes reserved: [6544/6.3kb], usages [request=90146464/85.9mb, fielddata=103263/100.8kb, in_flight_requests=31500/30.7kb]
[2024-08-31T23:48:50,959][DEBUG][o.o.i.b.HierarchyCircuitBreakerService] [osbbackup101-monit-backup1_data2] [parent] Data too large, data for [cluster:monitor/nodes/info[n]] would be [33031028310/30.7gb], which is larger than the limit of [31621696716/29.4gb], real usage: [33031018456/30.7gb], new bytes reserved: [9854/9.6kb], usages [request=90146464/85.9mb, fielddata=103263/100.8kb, in_flight_requests=31500/30.7kb]
[2024-08-31T23:48:50,959][WARN ][o.o.m.j.JvmGcMonitorService] [osbbackup101-monit-backup1_data2] [gc][667] overhead, spent [891ms] collecting in the last [1.1s]
[2024-08-31T23:48:50,959][DEBUG][o.o.i.b.HierarchyCircuitBreakerService] [osbbackup101-monit-backup1_data2] [parent] Data too large, data for [indices:admin/seq_no/retention_lease_background_sync[r]] would be [33031019042/30.7gb], which is larger than the limit of [31621696716/29.4gb], real usage: [33031018456/30.7gb], new bytes reserved: [586/586b], usages [request=90146464/85.9mb, fielddata=103263/100.8kb, in_flight_requests=31500/30.7kb]
[2024-08-31T23:48:50,959][DEBUG][o.o.i.b.HierarchyCircuitBreakerService] [osbbackup101-monit-backup1_data2] [parent] Data too large, data for [cluster:monitor/tasks/lists[n]] would be [33031018534/30.7gb], which is larger than the limit of [31621696716/29.4gb], real usage: [33031018456/30.7gb], new bytes reserved: [78/78b], usages [request=90146464/85.9mb, fielddata=103263/100.8kb, in_flight_requests=31500/30.7kb]
[2024-08-31T23:48:50,959][DEBUG][o.o.i.b.HierarchyCircuitBreakerService] [osbbackup101-monit-backup1_data2] [parent] Data too large, data for [indices:admin/seq_no/retention_lease_background_sync[r]] would be [33031019082/30.7gb], which is larger than the limit of [31621696716/29.4gb], real usage: [33031018456/30.7gb], new bytes reserved: [626/626b], usages [request=90146464/85.9mb, fielddata=103263/100.8kb, in_flight_requests=31500/30.7kb]
[2024-08-31T23:48:50,960][DEBUG][o.o.t.TransportService ] [osbbackup101-monit-backup1_data2] Action: internal:coordination/fault_detection/leader_check
[2024-08-31T23:48:50,961][DEBUG][o.o.i.b.HierarchyCircuitBreakerService] [osbbackup101-monit-backup1_data2] [parent] Data too large, data for [cluster:monitor/nodes/stats[n]] would be [33031028832/30.7gb], which is larger than the limit of [31621696716/29.4gb], real usage: [33031018456/30.7gb], new bytes reserved: [10376/10.1kb], usages [request=90146464/85.9mb, fielddata=103263/100.8kb, in_flight_requests=12546/12.2kb]
[2024-08-31T23:48:50,961][DEBUG][o.o.t.TaskManager ] [osbbackup101-monit-backup1_data2] Refreshing resource stats for Task: 6169
[2024-08-31T23:48:50,962][DEBUG][o.o.c.t.r.ResourceUsageInfo] [osbbackup101-monit-backup1_data2] updated resource usage info [resource_stats=[memory_in_bytes], old_end_value=30732949552, new_end_value=30924956520]
[2024-08-31T23:48:50,962][DEBUG][o.o.c.t.r.ResourceUsageInfo] [osbbackup101-monit-backup1_data2] updated resource usage info [resource_stats=[cpu_time_in_nanos], old_end_value=166516903330, new_end_value=166730650043]
[2024-08-31T23:48:50,962][DEBUG][o.o.s.b.t.HeapUsageTracker] [osbbackup101-monit-backup1_data2] heap usage not dominated by search requests [0/4992899481]
[2024-08-31T23:48:50,970][DEBUG][o.o.i.b.HierarchyCircuitBreakerService] [osbbackup101-monit-backup1_data2] [parent] Data too large, data for [indices:monitor/stats[n]] would be [33031061228/30.7gb], which is larger than the limit of [31621696716/29.4gb], real usage: [33031018456/30.7gb], new bytes reserved: [42772/41.7kb], usages [request=90146464/85.9mb, fielddata=103263/100.8kb, in_flight_requests=44942/43.8kb]
[2024-08-31T23:48:51,056][DEBUG][o.o.i.b.HierarchyCircuitBreakerService] [osbbackup101-monit-backup1_data2] [parent] Data too large, data for [indices:admin/seq_no/retention_lease_background_sync[r]] would be [33081350690/30.8gb], which is larger than the limit of [31621696716/29.4gb], real usage: [33081350104/30.8gb], new bytes reserved: [586/586b], usages [request=90146464/85.9mb, fielddata=103263/100.8kb, in_flight_requests=2756/2.6kb]
[2024-08-31T23:48:51,141][DEBUG][o.o.i.b.HierarchyCircuitBreakerService] [osbbackup101-monit-backup1_data2] [parent] Data too large, data for [indices:admin/seq_no/retention_lease_background_sync[r]] would be [33165236810/30.8gb], which is larger than the limit of [31621696716/29.4gb], real usage: [33165236184/30.8gb], new bytes reserved: [626/626b], usages [request=90146464/85.9mb, fielddata=103263/100.8kb, in_flight_requests=2796/2.7kb]
[2024-08-31T23:48:51,953][DEBUG][o.o.n.r.t.AverageMemoryUsageTracker] [osbbackup101-monit-backup1_data2] Recording memory usage: 99%
[2024-08-31T23:48:51,954][DEBUG][o.o.n.r.t.AverageCpuUsageTracker] [osbbackup101-monit-backup1_data2] Recording cpu usage: 44%
[2024-08-31T23:48:51,954][DEBUG][o.o.i.b.HierarchyCircuitBreakerService] [osbbackup101-monit-backup1_data2] [parent] Data too large, data for [indices:admin/seq_no/retention_lease_background_sync[r]] would be [33197990626/30.9gb], which is larger than the limit of [31621696716/29.4gb], real usage: [33197990000/30.9gb], new bytes reserved: [626/626b], usages [request=90146464/85.9mb, fielddata=103263/100.8kb, in_flight_requests=15560/15.1kb]
[2024-08-31T23:48:51,954][DEBUG][o.o.i.b.HierarchyCircuitBreakerService] [osbbackup101-monit-backup1_data2] [parent] Data too large, data for [indices:admin/seq_no/retention_lease_background_sync[r]] would be [33197990598/30.9gb], which is larger than the limit of [31621696716/29.4gb], real usage: [33197990000/30.9gb], new bytes reserved: [598/598b], usages [request=90146464/85.9mb, fielddata=103263/100.8kb, in_flight_requests=15560/15.1kb]
[2024-08-31T23:48:51,954][DEBUG][o.o.i.b.HierarchyCircuitBreakerService] [osbbackup101-monit-backup1_data2] [parent] Data too large, data for [indices:admin/seq_no/retention_lease_background_sync[r]] would be [33197990590/30.9gb], which is larger than the limit of [31621696716/29.4gb], real usage: [33197990000/30.9gb], new bytes reserved: [590/590b], usages [request=90146464/85.9mb, fielddata=103263/100.8kb, in_flight_requests=15560/15.1kb]
[2024-08-31T23:48:51,954][DEBUG][o.o.i.b.HierarchyCircuitBreakerService] [osbbackup101-monit-backup1_data2] [parent] Data too large, data for [indices:admin/seq_no/retention_lease_background_sync[r]] would be [33197990602/30.9gb], which is larger than the limit of [31621696716/29.4gb], real usage: [33197990000/30.9gb], new bytes reserved: [602/602b], usages [request=90146464/85.9mb, fielddata=103263/100.8kb, in_flight_requests=15560/15.1kb]
[2024-08-31T23:48:51,954][DEBUG][o.o.i.b.HierarchyCircuitBreakerService] [osbbackup101-monit-backup1_data2] [parent] Data too large, data for [indices:admin/seq_no/retention_lease_background_sync[r]] would be [33197990598/30.9gb], which is larger than the limit of [31621696716/29.4gb], real usage: [33197990000/30.9gb], new bytes reserved: [598/598b], usages [request=90146464/85.9mb, fielddata=103263/100.8kb, in_flight_requests=15560/15.1kb]
[2024-08-31T23:48:51,954][DEBUG][o.o.i.b.HierarchyCircuitBreakerService] [osbbackup101-monit-backup1_data2] [parent] Data too large, data for [cluster:monitor/nodes/stats[n]] would be [33198000376/30.9gb], which is larger than the limit of [31621696716/29.4gb], real usage: [33197990000/30.9gb], new bytes reserved: [10376/10.1kb], usages [request=90146464/85.9mb, fielddata=103263/100.8kb, in_flight_requests=15560/15.1kb]
[2024-08-31T23:48:51,956][DEBUG][o.o.i.b.HierarchyCircuitBreakerService] [osbbackup101-monit-backup1_data2] [parent] Data too large, data for [indices:admin/seq_no/retention_lease_background_sync[r]] would be [33197990586/30.9gb], which is larger than the limit of [31621696716/29.4gb], real usage: [33197990000/30.9gb], new bytes reserved: [586/586b], usages [request=90146464/85.9mb, fielddata=103263/100.8kb, in_flight_requests=3382/3.3kb]
[2024-08-31T23:48:51,956][DEBUG][o.o.i.b.HierarchyCircuitBreakerService] [osbbackup101-monit-backup1_data2] [parent] Data too large, data for [indices:admin/seq_no/retention_lease_background_sync[r]] would be [33197990626/30.9gb], which is larger than the limit of [31621696716/29.4gb], real usage: [33197990000/30.9gb], new bytes reserved: [626/626b], usages [request=90146464/85.9mb, fielddata=103263/100.8kb, in_flight_requests=3382/3.3kb]
[2024-08-31T23:48:51,960][WARN ][o.o.m.j.JvmGcMonitorService] [osbbackup101-monit-backup1_data2] [gc][668] overhead, spent [803ms] collecting in the last [1s]
[2024-08-31T23:48:51,961][DEBUG][o.o.t.TransportService ] [osbbackup101-monit-backup1_data2] Action: internal:coordination/fault_detection/leader_check
[2024-08-31T23:48:51,963][DEBUG][o.o.t.TaskManager ] [osbbackup101-monit-backup1_data2] Refreshing resource stats for Task: 6169
[2024-08-31T23:48:51,963][DEBUG][o.o.c.t.r.ResourceUsageInfo] [osbbackup101-monit-backup1_data2] updated resource usage info [resource_stats=[memory_in_bytes], old_end_value=30924956520, new_end_value=31084986240]
[2024-08-31T23:48:51,963][DEBUG][o.o.c.t.r.ResourceUsageInfo] [osbbackup101-monit-backup1_data2] updated resource usage info [resource_stats=[cpu_time_in_nanos], old_end_value=166730650043, new_end_value=166928829066]
[2024-08-31T23:48:51,963][DEBUG][o.o.s.b.t.HeapUsageTracker] [osbbackup101-monit-backup1_data2] heap usage not dominated by search requests [0/4992899481]
[2024-08-31T23:48:51,967][DEBUG][o.o.i.b.HierarchyCircuitBreakerService] [osbbackup101-monit-backup1_data2] [parent] Data too large, data for [cluster:monitor/tasks/lists[n]] would be [33197990096/30.9gb], which is larger than the limit of [31621696716/29.4gb], real usage: [33197990000/30.9gb], new bytes reserved: [96/96b], usages [request=90146464/85.9mb, fielddata=103263/100.8kb, in_flight_requests=2266/2.2kb]
[2024-08-31T23:48:55,517][INFO ][o.o.i.b.HierarchyCircuitBreakerService] [osbbackup101-monit-backup1_data2] attempting to trigger G1GC due to high heap usage [31729041960]
[2024-08-31T23:48:55,517][DEBUG][o.o.n.r.t.AverageMemoryUsageTracker] [osbbackup101-monit-backup1_data2] Recording memory usage: 95%
[2024-08-31T23:48:55,609][DEBUG][o.o.n.r.t.AverageCpuUsageTracker] [osbbackup101-monit-backup1_data2] Recording cpu usage: 59%
[2024-08-31T23:48:55,609][WARN ][o.o.m.j.JvmGcMonitorService] [osbbackup101-monit-backup1_data2] [gc][669] overhead, spent [3.5s] collecting in the last [3.5s]
[2024-08-31T23:48:55,610][DEBUG][o.o.t.TransportService ] [osbbackup101-monit-backup1_data2] Action: internal:coordination/fault_detection/leader_check
[2024-08-31T23:48:55,610][INFO ][o.o.i.b.HierarchyCircuitBreakerService] [osbbackup101-monit-backup1_data2] GC did bring memory usage down, before [31729041960], after [832549984], allocations [1], duration [93]
[2024-08-31T23:48:55,612][DEBUG][o.o.t.TaskManager ] [osbbackup101-monit-backup1_data2] Task execution finished on thread. Task: 6169, Thread: 301
[2024-08-31T23:48:55,612][DEBUG][o.o.c.t.r.ResourceUsageInfo] [osbbackup101-monit-backup1_data2] updated resource usage info [resource_stats=[memory_in_bytes], old_end_value=31084986240, new_end_value=31093069072]
[2024-08-31T23:48:55,612][DEBUG][o.o.c.t.r.ResourceUsageInfo] [osbbackup101-monit-backup1_data2] updated resource usage info [resource_stats=[cpu_time_in_nanos], old_end_value=166928829066, new_end_value=166944765687]
[2024-08-31T23:48:55,614][ERROR][o.o.b.OpenSearchUncaughtExceptionHandler] [osbbackup101-monit-backup1_data2] fatal error in thread [opensearch[osbbackup101-monit-backup1_data2][search][T#3]], exiting
java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.ArrayList.<init>(ArrayList.java:156) ~[?:?]
at org.opensearch.search.aggregations.bucket.BucketsAggregator.buildAggregationsForVariableBuckets(BucketsAggregator.java:411) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.aggregations.bucket.histogram.DateHistogramAggregator.buildAggregations(DateHistogramAggregator.java:208) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.aggregations.bucket.BucketsAggregator.buildSubAggsForBuckets(BucketsAggregator.java:220) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.aggregations.bucket.BucketsAggregator.buildSubAggsForAllBuckets(BucketsAggregator.java:286) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.access$400(GlobalOrdinalsStringTermsAggregator.java:90) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator$StandardTermsResults.buildSubAggs(GlobalOrdinalsStringTermsAggregator.java:900) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator$StandardTermsResults.buildSubAggs(GlobalOrdinalsStringTermsAggregator.java:847) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator$ResultStrategy.buildAggregations(GlobalOrdinalsStringTermsAggregator.java:762) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.buildAggregations(GlobalOrdinalsStringTermsAggregator.java:316) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.aggregations.bucket.BucketsAggregator.buildSubAggsForBuckets(BucketsAggregator.java:220) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.aggregations.bucket.BucketsAggregator.buildSubAggsForAllBuckets(BucketsAggregator.java:286) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.access$400(GlobalOrdinalsStringTermsAggregator.java:90) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator$StandardTermsResults.buildSubAggs(GlobalOrdinalsStringTermsAggregator.java:900) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator$StandardTermsResults.buildSubAggs(GlobalOrdinalsStringTermsAggregator.java:847) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator$ResultStrategy.buildAggregations(GlobalOrdinalsStringTermsAggregator.java:762) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.buildAggregations(GlobalOrdinalsStringTermsAggregator.java:316) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.aggregations.bucket.BucketsAggregator.buildSubAggsForBuckets(BucketsAggregator.java:220) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.aggregations.bucket.BucketsAggregator.buildSubAggsForAllBuckets(BucketsAggregator.java:286) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.access$400(GlobalOrdinalsStringTermsAggregator.java:90) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator$StandardTermsResults.buildSubAggs(GlobalOrdinalsStringTermsAggregator.java:900) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator$StandardTermsResults.buildSubAggs(GlobalOrdinalsStringTermsAggregator.java:847) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator$ResultStrategy.buildAggregations(GlobalOrdinalsStringTermsAggregator.java:762) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.buildAggregations(GlobalOrdinalsStringTermsAggregator.java:316) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.aggregations.Aggregator.buildTopLevel(Aggregator.java:205) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.aggregations.BucketCollectorProcessor.processPostCollection(BucketCollectorProcessor.java:78) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:286) ~[opensearch-2.15.0.jar:2.15.0]
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:552) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
at org.opensearch.search.query.QueryPhase.searchWithCollector(QueryPhase.java:355) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWithCollector(QueryPhase.java:462) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWithCollector(QueryPhase.java:450) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWith(QueryPhase.java:432) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:286) ~[opensearch-2.15.0.jar:2.15.0]
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:552) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
at org.opensearch.search.query.QueryPhase.searchWithCollector(QueryPhase.java:355) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWithCollector(QueryPhase.java:462) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWithCollector(QueryPhase.java:450) ~[opensearch-2.15.0.jar:2.15.0]
at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWith(QueryPhase.java:432) ~[opensearch-2.15.0.jar:2.15.0]
These are the current settings of the cluster (in case you see anything wrong with them):
GET /_cluster/settings
{
"persistent": {
"plugins": {
"index_state_management": {
"metadata_migration": {
"status": "1"
},
"template_migration": {
"control": "-1"
}
}
},
"search": {
"default_search_timeout": "5s",
"max_buckets": "10000",
"cancel_after_time_interval": "5s"
},
"search_backpressure": {
"mode": "enforced",
"node_duress": {
"cpu_threshold": "0.9",
"heap_threshold": "0.75",
"num_successive_breaches": "3"
},
"search_shard_task": {
"elapsed_time_millis_threshold": "30000",
"heap_variance": "2.0",
"heap_percent_threshold": "0.10",
"cancellation_burst": "10.0",
"cpu_time_millis_threshold": "15000",
"cancellation_ratio": "0.1",
"cancellation_rate": "0.003",
"total_heap_percent_threshold": "0.15",
"heap_moving_average_window_size": "100"
},
"search_task": {
"elapsed_time_millis_threshold": "45000",
"heap_variance": "2.0",
"heap_percent_threshold": "0.10",
"cancellation_burst": "5.0",
"cpu_time_millis_threshold": "30000",
"cancellation_ratio": "0.1",
"cancellation_rate": "0.003",
"total_heap_percent_threshold": "0.15",
"heap_moving_average_window_size": "100"
}
}
},
"transient": {
"search": {
"default_search_timeout": "3s",
"max_buckets": "65535",
"low_level_cancellation": "true",
"cancel_after_time_interval": "3s"
},
"search_backpressure": {
"mode": "enforced",
"node_duress": {
"heap_threshold": "0.7",
"num_successive_breaches": "1"
}
}
}
}
These are what was reported by the _cat/tasks
during the query execution (both data nodes crashed)
indices:data/read/search 7QMJAF3gTqmuFEXk4advYA:3878353 - transport 1725140757616 21:45:57 2.7m oscbackup101-monit-backup1_client5
indices:data/read/search[phase/query] krMR7qVATei7PSDxMVHm1Q:18970 7QMJAF3gTqmuFEXk4advYA:3878353 transport 1725140757631 21:45:57 2.7m osabackup101-monit-backup1_data1
indices:data/read/search[phase/query] QQ-NoPkXSgOhN2CRJcA_IQ:6169 7QMJAF3gTqmuFEXk4advYA:3878353 transport 1725140757639 21:45:57 2.7m osbbackup101-monit-backup1_data2
If you want me to add any other extra information or test any other thing let me know.
This is odd. There might be nodestats for search backpressure that you can also share
curl -X GET "localhost:9200/_nodes/stats/search_backpressure?pretty&human"
The output would be something like this for both, search (coordinator) and shard tasks...
"search_backpressure" : {
"search_task" : {
"resource_tracker_stats" : {
"elapsed_time_tracker" : {
"cancellation_count" : 0,
"current_max" : "0s",
"current_max_millis" : 0,
"current_avg" : "0s",
"current_avg_millis" : 0
},
"heap_usage_tracker" : {
"cancellation_count" : 0,
"current_max" : "0b",
"current_max_bytes" : 0,
"current_avg" : "0b",
"current_avg_bytes" : 0,
"rolling_avg" : "728.8kb",
"rolling_avg_bytes" : 746360
},
"cpu_usage_tracker" : {
"cancellation_count" : 0,
"current_max" : "0s",
"current_max_millis" : 0,
"current_avg" : "0s",
"current_avg_millis" : 0
}
},
"cancellation_stats" : {
"cancellation_count" : 0,
"cancelled_task_percentage" : 0.0,
"cancellation_limit_reached_count" : 0,
"current_cancellation_eligible_tasks_count" : 0
}
},
"search_shard_task" : {
...
}
}
One possibility is that the task cancellation itself is self-throttling, and you will need to tinker with those values to avoid throttling. (Ref: https://opensearch.org/docs/2.15/tuning-your-cluster/availability-and-recovery/search-backpressure/ ) In the meantime, if there is an allocation being made outside of the tasks or something that the resource tracking framework isn't able to measure through the tasks, we might need to inspect some dumps of the histogram.
Could you capture and share the histogram dumps??
The multiple samples will reveal which objects are rapidly growing in count and hogging the memory. The failed allocations at the time of the OOME is more in line with the available heap memory that is exhausted and not the cause.
Also, I'm assuming you are not running any painless scripts.
First of all, thanks a lot for taking the time to answer. It's really appreciated 😄
This is odd. There might be nodestats for search backpressure that you can also share
curl -X GET "localhost:9200/_nodes/stats/search_backpressure?pretty&human"
Yes, I’m plotting search_task.heap_usage.search(_shard)_task.current_avg_bytes
here. I believe these are the relevant metrics in this case (if you want another metric let me know).
One possibility is that the task cancellation itself is self-throttling, and you will need to tinker with those values to avoid throttling. (Ref: https://opensearch.org/docs/2.15/tuning-your-cluster/availability-and-recovery/search-backpressure/ )
Does the way search backpressure cancels a task differ from me calling _tasks/<id>/cancel
directly or using search.cancel_after_time_interval
? I'm trying to cancel it a couple of seconds after sending it with no effect. Can throttling really affect it so much that in the ~3 minutes the request takes to take a node into OOME, the cancel task gets "ignored" during this period?
About tinkering the values of backpressure, I've also tried with:
...
"transient": {
"search": {
...
},
"search_backpressure": {
"mode": "enforced",
"node_duress": {
"cpu_threshold": "0.1",
"heap_threshold": "0.3",
"num_successive_breaches": "1"
}
}
}
And I don't even see the message about the search backpressure service trying to kill a task (the node should be under duress with those values for about 30/40 seconds and the task should be killed either for time or heap usage).
Could you capture and share the histogram dumps??
This one is right before crashing. Does it provide the information you were looking for?
num #instances #bytes class name (module)
-------------------------------------------------------
1: 113733 30414943824 [Ljava.lang.Object; (java.base@21.0.3)
2: 14525 1464952168 [Ljdk.internal.vm.FillerElement; (java.base@21.0.3)
3: 4072653 220573688 [B (java.base@21.0.3)
4: 469391 140360848 [J (java.base@21.0.3)
5: 1997093 79883720 org.opensearch.search.aggregations.metrics.InternalMax
6: 2674789 64194936 java.lang.String (java.base@21.0.3)
7: 1997093 63906976 org.opensearch.search.aggregations.bucket.BucketsAggregator$1
8: 1885023 60320736 java.util.HashMap$Node (java.base@21.0.3)
9: 1214525 48581000 java.util.TreeMap$Entry (java.base@21.0.3)
10: 1997094 47930256 org.opensearch.search.aggregations.InternalAggregations
11: 718325 28733000 org.opensearch.search.aggregations.bucket.histogram.InternalDateHistogram$Bucket
12: 343770 24751440 org.apache.lucene.index.FieldInfo
13: 383223 24526272 org.apache.lucene.store.MemorySegmentIndexInput$SingleSegmentImpl
14: 880327 21127848 org.apache.lucene.util.BytesRef
15: 134288 19757208 [Ljava.util.HashMap$Node; (java.base@21.0.3)
16: 216532 15590304 org.apache.lucene.codecs.lucene90.blocktree.FieldReader
17: 383223 15328920 jdk.internal.foreign.MappedMemorySegmentImpl (java.base@21.0.3)
18: 279573 13419504 java.util.HashMap (java.base@21.0.3)
19: 257387 12354576 java.util.TreeMap (java.base@21.0.3)
20: 109995 12319440 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$SortedNumericEntry
21: 106744 11101376 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$TermsDictEntry
22: 119676 10531488 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$NumericEntry
23: 216535 10393680 org.apache.lucene.util.fst.FST$FSTMetadata
24: 321251 10280032 java.util.Collections$UnmodifiableMap (java.base@21.0.3)
25: 383223 9197352 [Ljava.lang.foreign.MemorySegment; (java.base@21.0.3)
26: 222402 8896080 org.apache.lucene.util.packed.DirectMonotonicReader$Meta
27: 2 8032936 [Lorg.opensearch.search.aggregations.InternalAggregation;
28: 1 7988392 [Lorg.opensearch.search.aggregations.InternalAggregations;
29: 112958 7229312 org.apache.lucene.util.bkd.BKDReader
30: 221655 7092960 java.util.concurrent.atomic.LongAdder (java.base@21.0.3)
31: 216532 6929024 org.apache.lucene.util.fst.OffHeapFSTStore
32: 20819 6242176 [I (java.base@21.0.3)
33: 222410 5575304 [F (java.base@21.0.3)
34: 216535 5196840 org.apache.lucene.util.fst.FST
35: 104606 5021088 org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$NormsEntry
36: 112958 4518320 org.apache.lucene.util.bkd.BKDConfig
37: 29772 3520144 java.lang.Class (java.base@21.0.3)
38: 138383 3321192 org.opensearch.common.util.concurrent.ReleasableLock
39: 69060 3314880 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync (java.base@21.0.3)
40: 102032 3265024 java.util.concurrent.ConcurrentHashMap$Node (java.base@21.0.3)
41: 8742 2912880 [Lorg.apache.lucene.index.FieldInfo;
42: 26206 2725424 org.apache.lucene.index.SegmentCommitInfo
43: 32220 2595776 [Ljava.util.WeakHashMap$Entry; (java.base@21.0.3)
44: 106744 2561856 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$SortedSetEntry
45: 106542 2557008 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$SortedEntry
46: 68352 2187264 org.opensearch.common.cache.Cache$CacheSegment
47: 62112 1987584 org.apache.lucene.codecs.lucene90.Lucene90CompoundReader$FileEntry
48: 14350 1758416 [C (java.base@21.0.3)
49: 52263 1672416 java.util.concurrent.locks.ReentrantLock$NonfairSync (java.base@21.0.3)
50: 68829 1651896 java.util.concurrent.locks.ReentrantReadWriteLock (java.base@21.0.3)
51: 68352 1640448 org.opensearch.common.cache.Cache$CacheSegment$SegmentStats
52: 28412 1591072 org.apache.lucene.document.FieldType
53: 32170 1544160 java.util.WeakHashMap (java.base@21.0.3)
54: 86110 1377760 java.lang.Object (java.base@21.0.3)
55: 32209 1288360 java.util.LinkedHashMap$Entry (java.base@21.0.3)
56: 50491 1211784 java.util.ArrayList (java.base@21.0.3)
57: 70371 1125936 java.lang.ThreadLocal (java.base@21.0.3)
58: 69064 1105024 java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock (java.base@21.0.3)
59: 69064 1105024 java.util.concurrent.locks.ReentrantReadWriteLock$Sync$ThreadLocalHoldCounter (java.base@21.0.3)
60: 69064 1105024 java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock (java.base@21.0.3)
61: 32264 1032448 java.lang.ref.ReferenceQueue (java.base@21.0.3)
62: 2361 1002640 [Ljava.util.concurrent.ConcurrentHashMap$Node; (java.base@21.0.3)
63: 19197 921456 java.lang.invoke.MemberName (java.base@21.0.3)
64: 56745 907920 java.util.concurrent.atomic.AtomicInteger (java.base@21.0.3)
65: 27571 882272 org.apache.lucene.util.Version
66: 13251 848064 java.util.LinkedHashMap (java.base@21.0.3)
67: 13062 835968 org.apache.lucene.index.SegmentInfo
68: 52219 835504 java.util.concurrent.locks.ReentrantLock (java.base@21.0.3)
69: 32869 788856 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject (java.base@21.0.3)
70: 48434 774944 java.util.HashSet (java.base@21.0.3)
71: 10499 755928 io.netty.buffer.PoolSubpage
72: 18755 750200 java.util.WeakHashMap$Entry (java.base@21.0.3)
73: 29772 714528 java.util.Collections$UnmodifiableRandomAccessList (java.base@21.0.3)
74: 8000 704000 java.lang.reflect.Method (java.base@21.0.3)
75: 12553 702968 org.opensearch.search.aggregations.bucket.terms.StringTerms$Bucket
76: 20610 659520 org.opensearch.common.settings.Setting$Updater
77: 13355 641040 org.apache.lucene.index.LeafReaderContext
78: 38732 619712 java.util.HashMap$Values (java.base@21.0.3)
79: 25612 614688 org.apache.lucene.util.packed.DirectReader$DirectPackedReader8
80: 35411 566576 java.util.HashMap$KeySet (java.base@21.0.3)
81: 22697 544728 org.apache.lucene.util.packed.DirectReader$DirectPackedReader20
82: 33719 539504 java.util.Collections$UnmodifiableCollection (java.base@21.0.3)
83: 12386 495440 java.lang.invoke.MethodType (java.base@21.0.3)
84: 30159 482544 java.util.TreeMap$EntrySet (java.base@21.0.3)
85: 19851 476424 org.apache.lucene.util.FileDeleter$RefCount
86: 11404 456160 org.opensearch.index.analysis.NamedAnalyzer
87: 9468 454464 org.opensearch.painless.lookup.PainlessClass
88: 13998 447936 java.util.ImmutableCollections$Map1 (java.base@21.0.3)
89: 8779 444256 [Lorg.apache.lucene.util.LongValues;
90: 8770 420960 org.apache.lucene.util.packed.DirectMonotonicReader
91: 13104 419328 java.util.ImmutableCollections$MapN (java.base@21.0.3)
92: 10329 413160 java.io.FileDescriptor (java.base@21.0.3)
93: 12809 409888 java.lang.invoke.MethodType$ConcurrentWeakInternSet$WeakEntry (java.base@21.0.3)
94: 16546 397104 org.apache.lucene.index.FieldInfos$FieldDimensions
95: 16546 397104 org.apache.lucene.index.FieldInfos$FieldVectorProperties
96: 14231 393152 [Ljava.lang.Class; (java.base@21.0.3)
97: 16140 387360 org.apache.logging.log4j.message.ReusableMessageFactory
98: 4356 383328 org.apache.lucene.codecs.lucene90.compressing.FieldsIndexReader
99: 4356 383328 org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader
100: 11941 382112 java.lang.invoke.LambdaForm$Name (java.base@21.0.3)
101: 5186 373392 org.opensearch.index.mapper.TextFieldMapper
102: 2733 371688 org.opensearch.cluster.metadata.IndexMetadata
103: 15016 360384 java.util.Collections$SingletonList (java.base@21.0.3)
104: 5610 359040 java.util.concurrent.ConcurrentHashMap (java.base@21.0.3)
105: 4437 354960 org.apache.lucene.index.SegmentReader
106: 5487 351168 org.opensearch.cluster.routing.ShardRouting
107: 640 348160 io.netty.util.internal.shaded.org.jctools.queues.atomic.MpscAtomicArrayQueue
108: 7149 343152 sun.nio.ch.FileChannelImpl$DefaultUnmapper (java.base@21.0.3)
109: 5263 336832 org.opensearch.index.mapper.KeywordFieldMapper
110: 10452 334464 org.opensearch.index.mapper.TextSearchInfo
111: 1253 331312 [Z (java.base@21.0.3)
112: 13800 331200 java.util.Collections$SynchronizedSet (java.base@21.0.3)
113: 6894 316544 [Ljava.lang.String; (java.base@21.0.3)
114: 13180 316320 org.apache.lucene.util.CloseableThreadLocal
115: 13164 315936 java.lang.invoke.ResolvedMethodName (java.base@21.0.3)
116: 4356 313632 org.apache.lucene.index.SegmentCoreReaders
117: 12579 301896 org.apache.lucene.analysis.DelegatingAnalyzerWrapper$DelegatingReuseStrategy
118: 12539 300936 java.util.concurrent.atomic.AtomicLong (java.base@21.0.3)
119: 12525 300600 org.opensearch.common.Explicit
120: 9367 299744 org.opensearch.common.collect.CopyOnWriteHashMap$InnerNode
121: 17951 287216 org.opensearch.index.mapper.FieldMapper$MultiFields
122: 267 277680 [Lorg.opensearch.common.cache.Cache$CacheSegment;
123: 11256 270144 java.util.Arrays$ArrayList (java.base@21.0.3)
124: 4814 269584 org.opensearch.index.mapper.NumberFieldMapper
125: 11132 267168 java.util.Collections$SetFromMap (java.base@21.0.3)
126: 16621 265936 java.util.HashMap$EntrySet (java.base@21.0.3)
127: 10821 259704 org.apache.lucene.util.packed.DirectReader$DirectPackedReader16
128: 5263 252624 org.opensearch.index.mapper.KeywordFieldMapper$KeywordFieldType
129: 5186 248928 org.opensearch.index.mapper.TextFieldMapper$TextFieldType
130: 4356 243936 org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader$BlockState
131: 4356 243936 org.apache.lucene.index.ReadersAndUpdates
132: 3805 243520 org.opensearch.search.aggregations.bucket.histogram.InternalDateHistogram
133: 6063 242520 java.lang.invoke.DirectMethodHandle (java.base@21.0.3)
134: 55 234392 [S (java.base@21.0.3)
135: 633 233440 [[C (java.base@21.0.3)
136: 4814 231072 org.opensearch.index.mapper.NumberFieldMapper$NumberFieldType
137: 2740 219200 org.opensearch.cluster.routing.IndexShardRoutingTable
138: 4371 209808 org.apache.lucene.index.FieldInfos
139: 4368 209664 java.lang.invoke.DirectMethodHandle$Constructor (java.base@21.0.3)
140: 4357 209136 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer
141: 4350 208800 org.apache.lucene.index.PendingSoftDeletes
142: 3246 207744 java.security.Provider$Service (java.base@21.0.3)
143: 6350 203200 org.opensearch.common.logging.PrefixLogger
144: 12640 202240 java.util.Collections$UnmodifiableSet (java.base@21.0.3)
145: 458 199032 [Ljava.nio.ByteBuffer; (java.base@21.0.3)
146: 8109 194616 org.apache.logging.log4j.message.DefaultFlowMessageFactory
147: 4485 179400 java.lang.invoke.BoundMethodHandle$Species_L (java.base@21.0.3)
148: 4356 174240 org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsReader
149: 4350 174000 org.opensearch.common.lucene.index.OpenSearchLeafReader
150: 4328 173120 java.lang.ref.SoftReference (java.base@21.0.3)
151: 4322 172880 org.apache.lucene.codecs.lucene90.Lucene90NormsProducer
152: 3581 171888 jdk.internal.ref.CleanerImpl$PhantomCleanableRef (java.base@21.0.3)
153: 7149 171576 jdk.internal.foreign.SharedSession (java.base@21.0.3)
154: 7149 171576 sun.nio.ch.FileChannelImpl$1 (java.base@21.0.3)
155: 7050 169200 org.apache.lucene.util.packed.DirectReader$DirectPackedReader12
156: 5276 168832 java.util.Hashtable$Entry (java.base@21.0.3)
157: 7007 168168 java.util.LinkedList$Node (java.base@21.0.3)
158: 6997 167928 java.security.Provider$ServiceKey (java.base@21.0.3)
159: 4183 167320 java.lang.invoke.BoundMethodHandle$Species_LL (java.base@21.0.3)
160: 10150 162400 java.util.WeakHashMap$KeySet (java.base@21.0.3)
161: 9947 159152 java.util.LinkedHashSet (java.base@21.0.3)
162: 2173 156456 java.lang.reflect.Field (java.base@21.0.3)
163: 6443 154632 org.opensearch.core.index.Index
164: 6154 147696 java.util.concurrent.CopyOnWriteArrayList (java.base@21.0.3)
165: 9074 145184 java.util.concurrent.atomic.AtomicReference (java.base@21.0.3)
166: 9028 144448 org.apache.lucene.index.IndexReader$CacheKey
167: 5988 143712 org.opensearch.common.recycler.DequeRecycler$DV
168: 5988 143712 org.opensearch.common.recycler.Recyclers$1$1
169: 8856 141696 org.opensearch.common.metrics.CounterMetric
170: 2198 140672 java.net.URL (java.base@21.0.3)
171: 1941 139752 java.lang.reflect.Constructor (java.base@21.0.3)
172: 4357 139424 org.apache.lucene.index.SegmentDocValues$1
173: 4356 139392 org.apache.lucene.index.LeafMetaData
174: 4356 139392 org.apache.lucene.index.PendingDeletes
175: 4356 139392 org.apache.lucene.index.SegmentCoreReaders$1
176: 4356 139392 org.apache.lucene.index.SegmentCoreReaders$2
177: 4350 139200 org.apache.lucene.codecs.lucene90.Lucene90PointsReader
178: 4350 139200 org.apache.lucene.index.SegmentReadState
179: 4324 138368 org.apache.lucene.backward_codecs.lucene90.Lucene90PostingsReader
180: 2112 135168 sun.nio.ch.FileChannelImpl (java.base@21.0.3)
181: 2380 133280 org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput
182: 5487 131688 org.opensearch.cluster.routing.AllocationId
183: 8211 131376 org.opensearch.cluster.routing.RotationShardShuffler
184: 2301 128856 java.nio.HeapByteBuffer (java.base@21.0.3)
185: 3889 124448 org.apache.lucene.codecs.lucene90.Lucene90CompoundReader
186: 3860 123520 java.lang.ThreadLocal$ThreadLocalMap$Entry (java.base@21.0.3)
187: 1603 121784 [Ljava.lang.ref.SoftReference; (java.base@21.0.3)
188: 3043 121720 jdk.nio.zipfs.ZipFileSystem$IndexNode (jdk.zipfs@21.0.3)
189: 1374 120912 java.util.regex.Pattern (java.base@21.0.3)
190: 4911 117864 org.opensearch.common.inject.Key
191: 7248 115968 org.opensearch.common.SetOnce
192: 7149 114384 jdk.internal.foreign.MemorySessionImpl$1 (java.base@21.0.3)
193: 7149 114384 jdk.internal.foreign.SharedSession$SharedResourceList (java.base@21.0.3)
194: 2818 112720 org.opensearch.painless.lookup.PainlessMethod
195: 2811 112440 org.opensearch.painless.spi.WhitelistMethod
196: 3357 107424 sun.nio.fs.UnixPath (java.base@21.0.3)
197: 4437 106488 org.apache.lucene.index.SegmentReader$1
198: 4357 104568 org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsReader
199: 4356 104544 org.apache.lucene.codecs.lucene90.LZ4WithPresetDictCompressionMode$LZ4WithPresetDictDecompressor
200: 4356 104544 org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader
201: 4356 104544 org.apache.lucene.index.SegmentCoreReaders$3
202: 3260 104320 org.opensearch.common.inject.spi.Dependency
203: 3253 104096 java.util.Collections$UnmodifiableSortedMap (java.base@21.0.3)
204: 3250 104000 org.opensearch.common.settings.Settings
205: 4168 100032 java.util.regex.Pattern$Slice (java.base@21.0.3)
206: 4063 97512 org.opensearch.common.inject.TypeLiteral
207: 3969 95256 java.lang.invoke.LambdaForm$NamedFunction (java.base@21.0.3)
208: 2913 93216 org.tartarus.snowball.Among
209: 1212 91840 [Ljava.lang.invoke.LambdaForm$Name; (java.base@21.0.3)
210: 603 91656 sun.security.ssl.SSLSessionImpl (java.base@21.0.3)
211: 1888 90624 org.apache.logging.log4j.message.ReusableParameterizedMessage
212: 3776 90624 org.apache.lucene.index.ApproximatePriorityQueue
213: 1250 90000 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask (java.base@21.0.3)
214: 656 89216 sun.nio.fs.UnixFileAttributes (java.base@21.0.3)
215: 2731 87392 org.opensearch.cluster.routing.IndexRoutingTable
216: 5377 86032 java.util.concurrent.atomic.AtomicBoolean (java.base@21.0.3)
217: 3535 84840 org.opensearch.common.metrics.MeanMetric
218: 3526 84624 java.util.regex.Pattern$GroupTail (java.base@21.0.3)
219: 3519 84456 java.util.regex.Pattern$GroupHead (java.base@21.0.3)
220: 2590 82880 java.util.LinkedList (java.base@21.0.3)
221: 940 82720 org.opensearch.index.codec.PerFieldMappingPostingFormatCodec
222: 1641 78768 org.opensearch.common.inject.internal.InstanceBindingImpl
223: 1396 78176 io.netty.channel.DefaultChannelHandlerContext
224: 3218 77232 org.opensearch.common.compress.CompressedXContent
225: 1423 76552 [Lorg.opensearch.search.aggregations.bucket.terms.StringTerms$Bucket;
226: 4778 76448 java.lang.Integer (java.base@21.0.3)
227: 3180 76320 org.opensearch.common.unit.TimeValue
228: 3167 76008 sun.reflect.generics.tree.SimpleClassTypeSignature (java.base@21.0.3)
229: 3140 75360 java.util.regex.Pattern$BmpCharProperty (java.base@21.0.3)
230: 4676 74816 java.util.concurrent.CopyOnWriteArraySet (java.base@21.0.3)
231: 4651 74416 java.util.Collections$UnmodifiableMap$UnmodifiableEntrySet (java.base@21.0.3)
232: 1536 73728 io.netty.buffer.PoolChunkList
233: 237 72048 org.opensearch.index.IndexSettings
234: 2985 71640 org.opensearch.core.index.shard.ShardId
235: 4437 70992 org.apache.lucene.index.SegmentReader$2
236: 1771 70840 java.io.FilePermission (java.base@21.0.3)
237: 679 70616 java.util.jar.JarFile$JarFileEntry (java.base@21.0.3)
238: 4356 69696 org.apache.lucene.index.SegmentDocValues
239: 235 69560 org.opensearch.index.shard.IndexShard
240: 1708 68320 org.apache.logging.log4j.core.Logger$PrivateConfig
241: 2827 67848 org.opensearch.common.inject.SingleParameterInjector
242: 942 67824 org.apache.lucene.index.SegmentInfos
243: 2768 66432 java.util.ArrayDeque (java.base@21.0.3)
244: 2732 65568 org.opensearch.cluster.metadata.MappingMetadata
245: 2731 65544 org.opensearch.cluster.metadata.IndexAbstraction$Index
246: 3167 61256 [Lsun.reflect.generics.tree.TypeArgument; (java.base@21.0.3)
247: 3709 59344 org.opensearch.common.SetOnce$Wrapper
248: 1476 59040 org.opensearch.common.settings.Setting
249: 281 58256 [Lio.netty.buffer.PoolSubpage;
250: 1820 58240 java.util.RegularEnumSet (java.base@21.0.3)
251: 1207 57936 java.lang.invoke.LambdaForm (java.base@21.0.3)
252: 1798 57536 java.lang.Package (java.base@21.0.3)
253: 1410 56400 [Lorg.opensearch.index.mapper.MetadataFieldMapper;
254: 1145 54960 java.lang.StackTraceElement (java.base@21.0.3)
255: 1708 54656 org.apache.logging.log4j.core.Logger
256: 1337 53480 java.lang.Package$VersionInfo (java.base@21.0.3)
257: 2200 52800 javax.crypto.spec.SecretKeySpec (java.base@21.0.3)
258: 3290 52640 java.util.TreeMap$KeySet (java.base@21.0.3)
259: 656 52480 java.util.zip.ZipFile$Source (java.base@21.0.3)
260: 798 51072 javax.crypto.Cipher (java.base@21.0.3)
261: 236 50976 org.apache.lucene.index.IndexWriter
262: 235 50760 org.opensearch.index.engine.InternalEngine
263: 2112 50688 sun.nio.ch.NativeThreadSet (java.base@21.0.3)
264: 3154 50464 sun.reflect.generics.tree.ClassTypeSignature (java.base@21.0.3)
265: 1205 48200 java.security.CodeSource (java.base@21.0.3)
266: 1484 47488 org.opensearch.common.collect.CopyOnWriteHashMap
267: 1965 47160 java.util.regex.Pattern$BmpCharPropertyGreedy (java.base@21.0.3)
268: 235 47000 org.opensearch.index.IndexService
269: 1894 45456 org.apache.logging.log4j.message.ParameterFormatter$MessagePatternAnalysis
270: 705 45120 java.util.zip.Inflater (java.base@21.0.3)
271: 399 44688 io.netty.handler.ssl.SslHandler
272: 399 44688 org.opensearch.transport.CopyBytesSocketChannel
273: 399 44688 sun.nio.ch.SocketChannelImpl (java.base@21.0.3)
274: 1103 44120 org.opensearch.ingest.useragent.UserAgentParser$UserAgentSubpattern
275: 787 44072 java.lang.invoke.DirectMethodHandle$StaticAccessor (java.base@21.0.3)
276: 1372 43904 java.util.regex.Pattern$Branch (java.base@21.0.3)
277: 1372 42888 [Ljava.util.regex.Pattern$Node; (java.base@21.0.3)
278: 2677 42832 java.util.regex.Pattern$$Lambda/0x80000002a (java.base@21.0.3)
279: 1725 41400 java.security.Provider$UString (java.base@21.0.3)
280: 1289 41248 org.opensearch.common.util.concurrent.ThreadContext$$Lambda/0x00007f2150542d40
281: 107 39856 [Ljava.lang.ThreadLocal$ThreadLocalMap$Entry; (java.base@21.0.3)
282: 1635 39240 java.lang.RuntimePermission (java.base@21.0.3)
283: 187 39224 [Ljava.lang.invoke.MethodHandle; (java.base@21.0.3)
284: 1215 38880 java.lang.ref.WeakReference (java.base@21.0.3)
285: 960 38400 org.apache.lucene.codecs.lucene90.Lucene90TermVectorsFormat
286: 663 37128 java.io.FileCleanable (java.base@21.0.3)
287: 656 36736 java.util.jar.JarFile (java.base@21.0.3)
288: 22 36224 [Ljava.util.Hashtable$Entry; (java.base@21.0.3)
289: 1468 35232 org.apache.lucene.util.packed.DirectReader$DirectPackedReader4
290: 399 35112 sun.security.ssl.TransportContext (java.base@21.0.3)
291: 871 34840 com.fasterxml.jackson.databind.introspect.AnnotatedMethod
292: 859 34360 org.jcodings.unicode.UnicodeCodeRange
293: 857 34280 org.opensearch.common.path.PathTrie$TrieNode
294: 475 34200 org.apache.lucene.index.TieredMergePolicy
295: 235 33840 org.opensearch.index.translog.TranslogWriter
296: 2112 33792 sun.nio.ch.FileChannelImpl$Closer (java.base@21.0.3)
297: 1051 33632 io.netty.util.internal.LongAdderCounter
298: 1389 33336 org.opensearch.core.common.unit.ByteSizeValue
299: 1 32792 [Lkotlinx.coroutines.scheduling.CoroutineScheduler$Worker;
300: 1014 32448 org.apache.lucene.search.TermQuery
301: 1003 32096 java.lang.invoke.MethodTypeForm (java.base@21.0.3)
302: 236 32096 org.apache.lucene.index.DocumentsWriterFlushControl
303: 235 31960 org.opensearch.index.seqno.ReplicationTracker
304: 798 31920 io.netty.handler.ssl.SslHandler$LazyChannelPromise
305: 399 31920 sun.security.ssl.SSLConfiguration (java.base@21.0.3)
306: 656 31488 jdk.internal.loader.URLClassPath$JarLoader (java.base@21.0.3)
307: 1301 31224 java.util.concurrent.CompletableFuture (java.base@21.0.3)
308: 1293 31032 java.util.concurrent.Executors$RunnableAdapter (java.base@21.0.3)
309: 1901 30416 org.apache.lucene.codecs.lucene90.Lucene90DocValuesFormat
310: 1267 30408 java.util.concurrent.ConcurrentLinkedQueue$Node (java.base@21.0.3)
311: 1265 30360 org.opensearch.cluster.node.DiscoveryNodeFilters
312: 947 30304 org.apache.lucene.codecs.lucene99.Lucene99HnswVectorsFormat
313: 754 30160 java.math.BigInteger (java.base@21.0.3)
314: 628 30144 com.sun.crypto.provider.GaloisCounterMode$AESGCM (java.base@21.0.3)
315: 235 30080 org.opensearch.index.engine.EngineConfig
316: 470 30080 org.opensearch.index.mapper.RootObjectMapper
317: 1877 30032 java.nio.channels.spi.AbstractInterruptibleChannel$1 (java.base@21.0.3)
318: 748 29920 java.lang.invoke.DirectMethodHandle$Special (java.base@21.0.3)
319: 1236 29664 org.opensearch.threadpool.ThreadPool$ThreadedRunnable
320: 1222 29328 java.util.LinkedHashMap$LinkedValues (java.base@21.0.3)
321: 1204 28896 org.apache.lucene.util.packed.DirectReader$DirectPackedReader24
322: 399 28728 sun.security.ssl.SSLEngineOutputRecord (java.base@21.0.3)
323: 1080 28464 [Ljava.lang.reflect.Type; (java.base@21.0.3)
324: 236 28320 org.apache.lucene.index.IndexWriterConfig
325: 235 28200 org.opensearch.index.engine.InternalEngine$EngineMergeScheduler
326: 583 27984 sun.security.util.MemoryCache$SoftCacheEntry (java.base@21.0.3)
327: 685 27400 io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry
328: 558 26784 org.apache.lucene.analysis.tokenattributes.PackedTokenAttributeImpl
329: 256 26624 io.netty.buffer.PoolArena$HeapArena
330: 1642 26272 org.opensearch.common.inject.util.Providers$1
331: 1641 26256 org.opensearch.common.inject.internal.InternalFactory$Instance
332: 13 26240 [[J (java.base@21.0.3)
333: 656 26240 java.io.RandomAccessFile (java.base@21.0.3)
334: 544 26112 org.opensearch.index.mapper.ObjectMapper
335: 8 26056 [Lorg.opensearch.common.recycler.Recycler$V;
336: 1070 25680 org.opensearch.security.filter.SecurityRestFilter$AuthczRestHandler
337: 401 25664 io.netty.channel.ChannelOutboundBuffer
338: 401 25664 io.netty.channel.DefaultChannelPipeline$HeadContext
339: 596 25624 [Ljava.security.ProtectionDomain; (java.base@21.0.3)
340: 399 25536 io.netty.channel.socket.nio.NioSocketChannel$NioSocketChannelConfig
341: 634 25360 java.lang.invoke.DirectMethodHandle$Interface (java.base@21.0.3)
342: 1582 25312 com.fasterxml.jackson.databind.introspect.AnnotationMap
343: 790 25280 org.opensearch.painless.lookup.PainlessField
344: 790 25280 org.opensearch.painless.spi.WhitelistField
345: 1042 25008 org.bouncycastle.asn1.ASN1ObjectIdentifier
346: 623 24920 org.opensearch.transport.RequestHandlerRegistry
347: 614 24560 java.security.AccessControlContext (java.base@21.0.3)
348: 1017 24408 org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable
349: 1523 24368 org.opensearch.common.logging.DeprecationLogger
350: 1015 24360 org.apache.lucene.index.Term
351: 1521 24336 org.opensearch.common.settings.Setting$SimpleKey
352: 727 23264 java.util.concurrent.Semaphore$NonfairSync (java.base@21.0.3)
353: 573 22920 java.util.IdentityHashMap (java.base@21.0.3)
354: 954 22896 sun.reflect.annotation.AnnotationInvocationHandler (java.base@21.0.3)
355: 475 22800 org.opensearch.common.util.MovingAverage
356: 948 22752 org.apache.lucene.codecs.lucene99.Lucene99PostingsFormat
357: 947 22728 org.apache.lucene.codecs.lucene99.Lucene99Codec$1
358: 947 22728 org.apache.lucene.codecs.lucene99.Lucene99Codec$2
359: 947 22728 org.apache.lucene.codecs.lucene99.Lucene99Codec$3
360: 940 22560 org.opensearch.index.codec.fuzzy.FuzzySetParameters
361: 470 22560 org.opensearch.index.mapper.DocumentMapper
362: 705 22560 org.opensearch.index.mapper.MapperService$MapperAnalyzerWrapper
363: 401 22456 io.netty.channel.DefaultChannelPipeline$TailContext
364: 400 22400 io.netty.channel.FixedRecvByteBufAllocator$HandleImpl
365: 399 22344 org.opensearch.transport.InboundPipeline
366: 399 22344 sun.security.ssl.SSLCipher$T13GcmReadCipherGenerator$GcmReadCipher (java.base@21.0.3)
367: 399 22344 sun.security.ssl.SSLCipher$T13GcmWriteCipherGenerator$GcmWriteCipher (java.base@21.0.3)
368: 309 22248 org.apache.lucene.analysis.standard.StandardTokenizerImpl
369: 238 22096 [Lorg.apache.lucene.index.LeafReader;
370: 920 22080 java.util.concurrent.ConcurrentLinkedQueue (java.base@21.0.3)
371: 237 21976 [Lorg.apache.lucene.index.IndexReaderContext;
372: 1372 21952 java.util.regex.Pattern$BranchConn (java.base@21.0.3)
373: 684 21888 javax.crypto.Cipher$Transform (java.base@21.0.3)
374: 235 21736 [Lorg.apache.lucene.index.SegmentReader;
375: 679 21728 java.util.zip.ZipFile$CleanableResource (java.base@21.0.3)
376: 905 21720 org.opensearch.sql.expression.function.FunctionSignature
377: 338 21632 sun.security.ssl.CipherSuite (java.base@21.0.3)
378: 267 21360 org.opensearch.common.cache.Cache
379: 890 21360 org.opensearch.common.inject.InternalFactoryToProviderAdapter
380: 890 21360 org.opensearch.common.inject.ProviderToInternalFactoryAdapter
381: 890 21360 org.opensearch.common.inject.Scopes$1$1
382: 667 21344 java.io.File (java.base@21.0.3)
383: 879 21096 java.util.regex.Pattern$SliceI (java.base@21.0.3)
384: 876 21024 sun.security.provider.PolicyFile$PolicyEntry (java.base@21.0.3)
385: 653 20896 sun.reflect.generics.repository.ClassRepository (java.base@21.0.3)
386: 870 20880 java.lang.Class$AnnotationData (java.base@21.0.3)
387: 652 20864 java.util.PropertyPermission (java.base@21.0.3)
388: 651 20832 java.net.InetAddress$InetAddressHolder (java.base@21.0.3)
389: 866 20784 org.opensearch.common.inject.multibindings.RealElement
390: 235 20680 org.opensearch.index.SearchSlowLog
391: 235 20680 org.opensearch.security.configuration.SecurityFlsDlsIndexSearcherWrapper
392: 849 20376 sun.reflect.generics.factory.CoreReflectionFactory (java.base@21.0.3)
393: 636 20352 org.opensearch.core.ParseField
394: 628 20096 com.sun.crypto.provider.AESCrypt (java.base@21.0.3)
395: 624 19968 io.netty.buffer.PoolThreadCache$SubPageMemoryRegionCache
396: 831 19944 sun.reflect.generics.reflectiveObjects.ParameterizedTypeImpl (java.base@21.0.3)
397: 475 19800 [Lorg.opensearch.common.inject.SingleParameterInjector;
398: 309 19776 org.apache.lucene.analysis.standard.StandardTokenizer
399: 810 19440 org.apache.logging.log4j.MarkerManager$Log4jMarker
400: 809 19400 [Ljava.security.cert.X509Certificate; (java.base@21.0.3)
401: 1210 19360 java.util.regex.Pattern$BitClass (java.base@21.0.3)
402: 401 19248 io.netty.channel.AbstractChannel$CloseFuture
403: 401 19248 io.netty.channel.DefaultChannelPipeline
404: 802 19248 io.netty.channel.VoidChannelPromise
405: 401 19248 sun.nio.ch.SelectionKeyImpl (java.base@21.0.3)
406: 798 19152 io.netty.handler.ssl.SslHandler$SslTasksRunner
407: 399 19152 sun.nio.ch.SocketAdaptor (java.base@21.0.3)
408: 399 19152 sun.security.ssl.SSLEngineInputRecord (java.base@21.0.3)
409: 236 18880 [Ljava.util.concurrent.locks.Lock; (java.base@21.0.3)
410: 236 18880 [Lorg.apache.lucene.index.ApproximatePriorityQueue;
411: 470 18800 java.util.concurrent.CompletableFuture$UniWhenComplete (java.base@21.0.3)
412: 470 18800 org.opensearch.index.mapper.Mapping
413: 470 18800 org.opensearch.index.mapper.MappingLookup
414: 470 18800 org.opensearch.index.seqno.RetentionLease
415: 235 18800 org.opensearch.index.translog.LocalTranslog
416: 32 18432 io.netty.util.internal.shaded.org.jctools.queues.atomic.MpscUnboundedAtomicArrayQueue
417: 768 18432 java.util.concurrent.ConcurrentHashMap$KeySetView (java.base@21.0.3)
418: 715 17160 java.time.format.DateTimeFormatterBuilder$DefaultValueParser (java.base@21.0.3)
419: 715 17160 jdk.internal.reflect.DirectConstructorHandleAccessor (java.base@21.0.3)
420: 714 17136 java.util.regex.Pattern$Start (java.base@21.0.3)
421: 1066 17056 org.opensearch.common.inject.Initializables$1
422: 710 17040 sun.reflect.generics.scope.ClassScope (java.base@21.0.3)
423: 236 16992 org.apache.lucene.index.DocumentsWriterDeleteQueue
424: 705 16920 java.util.zip.Inflater$InflaterZStreamRef (java.base@21.0.3)
425: 235 16920 org.apache.lucene.analysis.miscellaneous.FingerprintFilter
426: 235 16920 org.apache.lucene.index.StandardDirectoryReader
427: 235 16920 org.opensearch.common.lucene.index.OpenSearchDirectoryReader
428: 235 16920 org.opensearch.index.IndexModule
429: 235 16920 org.opensearch.index.search.stats.ShardSearchStats$StatsHolder
430: 235 16920 org.opensearch.index.translog.Checkpoint
431: 139 16680 java.lang.Thread (java.base@21.0.3)
432: 689 16536 sun.security.jca.ServiceId (java.base@21.0.3)
433: 685 16440 com.fasterxml.jackson.databind.introspect.MemberKey
434: 683 16392 org.opensearch.rest.RestMethodHandlers
435: 1018 16288 java.util.AbstractMap$2 (java.base@21.0.3)
436: 339 16272 java.lang.invoke.DirectMethodHandle$Accessor (java.base@21.0.3)
437: 399 15960 io.netty.channel.socket.nio.NioSocketChannel$NioSocketChannelUnsafe
438: 399 15960 org.opensearch.transport.InboundAggregator
439: 399 15960 org.opensearch.transport.InboundDecoder
440: 399 15960 org.opensearch.transport.netty4.Netty4TcpChannel
441: 249 15936 java.lang.invoke.BoundMethodHandle$Species_LLLLLLL (java.base@21.0.3)
442: 656 15744 java.util.zip.ZipFile$Source$Key (java.base@21.0.3)
443: 653 15672 sun.reflect.generics.tree.ClassSignature (java.base@21.0.3)
444: 632 15600 [[I (java.base@21.0.3)
445: 973 15568 org.opensearch.threadpool.ScheduledCancellableAdapter
446: 486 15552 org.opensearch.common.inject.ConstructorInjector
447: 486 15552 org.opensearch.common.inject.MembersInjectorImpl
448: 961 15376 java.util.concurrent.Semaphore (java.base@21.0.3)
449: 320 15360 org.apache.lucene.analysis.StopFilter
450: 960 15360 org.apache.lucene.codecs.lucene90.Lucene90CompoundFormat
451: 960 15360 org.apache.lucene.codecs.lucene90.Lucene90LiveDocsFormat
452: 960 15360 org.apache.lucene.codecs.lucene90.Lucene90NormsFormat
453: 960 15360 org.apache.lucene.codecs.lucene90.Lucene90StoredFieldsFormat
454: 638 15312 java.net.Inet4Address (java.base@21.0.3)
455: 638 15312 java.net.InetSocketAddress$InetSocketAddressHolder (java.base@21.0.3)
456: 955 15280 org.apache.lucene.codecs.lucene94.Lucene94FieldInfosFormat
457: 237 15168 org.apache.lucene.index.LogByteSizeMergePolicy
458: 947 15152 org.apache.lucene.codecs.lucene99.Lucene99SegmentInfoFormat
459: 236 15104 org.apache.lucene.index.FieldInfos$FieldNumbers
460: 471 15072 org.opensearch.index.engine.LiveVersionMap$VersionLookup
461: 235 15040 org.opensearch.index.IndexingSlowLog
462: 940 15040 org.opensearch.index.codec.PerFieldMappingPostingFormatCodec$$Lambda/0x00007f21512a8a80
463: 940 15040 org.opensearch.index.codec.fuzzy.FuzzySetFactory
464: 470 15040 org.opensearch.index.mapper.FieldTypeLookup
465: 235 15040 org.opensearch.index.mapper.MapperService
466: 235 15040 org.opensearch.index.mapper.SourceFieldMapper
467: 235 15040 org.opensearch.index.shard.RefreshListeners
468: 235 15040 org.opensearch.index.translog.ReplicationTranslogDeletionPolicy
469: 470 15040 org.opensearch.index.translog.Translog$Location
470: 235 15040 org.opensearch.indices.recovery.RecoveryState$Translog
471: 235 15040 org.opensearch.indices.replication.common.ReplicationLuceneIndex
472: 624 14976 org.apache.lucene.util.AttributeSource$State
473: 825 14960 [Lsun.reflect.generics.tree.FormalTypeParameter; (java.base@21.0.3)
474: 623 14952 org.opensearch.indexmanagement.rollup.interceptor.RollupInterceptor$interceptHandler$1
475: 623 14952 org.opensearch.performanceanalyzer.transport.PerformanceAnalyzerTransportRequestHandler
476: 623 14952 org.opensearch.security.OpenSearchSecurityPlugin$6$1
477: 417 14816 [Ljava.time.format.DateTimeFormatterBuilder$DateTimePrinterParser; (java.base@21.0.3)
478: 240 14808 [[B (java.base@21.0.3)
479: 461 14752 com.fasterxml.jackson.databind.introspect.AnnotatedField
480: 365 14600 org.opensearch.painless.spi.WhitelistClass
481: 259 14504 java.lang.invoke.BoundMethodHandle$Species_LLLLL (java.base@21.0.3)
482: 600 14400 io.netty.buffer.IntPriorityQueue
483: 297 14256 java.lang.invoke.MethodHandleImpl$AsVarargsCollector (java.base@21.0.3)
484: 296 14208 org.apache.lucene.analysis.CharArrayMap
485: 352 14080 java.time.format.DateTimeFormatter (java.base@21.0.3)
486: 586 14064 java.util.regex.Pattern$CharPropertyGreedy (java.base@21.0.3)
487: 561 13904 [Ljava.security.cert.Certificate; (java.base@21.0.3)
488: 433 13856 org.opensearch.common.inject.FactoryProxy
489: 866 13856 org.opensearch.common.inject.Key$AnnotationInstanceStrategy
490: 433 13856 org.opensearch.core.xcontent.ObjectParser$FieldParser
491: 286 13728 java.lang.invoke.LambdaFormEditor$Transform (java.base@21.0.3)
492: 118 13704 [Lorg.tartarus.snowball.Among;
493: 338 13520 java.security.ProtectionDomain (java.base@21.0.3)
494: 563 13512 java.lang.Long (java.base@21.0.3)
495: 559 13416 java.security.BasicPermissionCollection (java.base@21.0.3)
496: 557 13368 java.security.SecurityPermission (java.base@21.0.3)
497: 236 13216 org.apache.lucene.index.DocumentsWriter
498: 236 13216 org.apache.lucene.index.IndexFileDeleter
499: 236 13216 org.apache.lucene.index.ReaderPool
500: 235 13160 org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilter
501: 235 13160 org.opensearch.index.engine.CombinedDeletionPolicy
502: 235 13160 org.opensearch.index.store.Store
503: 235 13160 org.opensearch.indices.recovery.RecoveryState
504: 235 13160 org.opensearch.indices.recovery.RecoveryState$VerifyIndex
505: 205 13120 io.netty.util.concurrent.ScheduledFutureTask
506: 817 13072 org.opensearch.common.concurrent.CompletableContext
507: 813 13008 org.opensearch.security.support.WildcardMatcher$Exact
508: 540 12960 java.util.ImmutableCollections$List12 (java.base@21.0.3)
509: 401 12832 io.netty.channel.DefaultChannelId
510: 401 12832 sun.nio.ch.DummySocketImpl (java.base@21.0.3)
511: 399 12768 io.netty.handler.ssl.SslHandler$SslHandlerCoalescingBufferQueue
512: 798 12768 io.netty.handler.ssl.SslHandler$SslTasksRunner$1
513: 399 12768 java.util.regex.Pattern$Curly (java.base@21.0.3)
514: 399 12768 org.opensearch.transport.nativeprotocol.NativeInboundBytesHandler
515: 399 12768 org.opensearch.transport.netty4.Netty4MessageChannelHandler
516: 399 12768 org.opensearch.transport.netty4.OpenSearchLoggingHandler
517: 798 12768 sun.security.ssl.Authenticator$TLS13Authenticator (java.base@21.0.3)
518: 399 12768 sun.security.ssl.SSLEngineImpl (java.base@21.0.3)
519: 394 12608 sun.security.util.ObjectIdentifier (java.base@21.0.3)
520: 314 12560 org.apache.lucene.analysis.LowerCaseFilter
521: 782 12512 java.util.concurrent.atomic.AtomicReferenceArray (java.base@21.0.3)
522: 758 12128 org.opensearch.security.support.WildcardMatcher$SimpleMatcher
523: 500 12000 org.opensearch.cluster.metadata.IndexGraveyard$Tombstone
524: 490 11760 org.opensearch.core.action.ActionListener$1
525: 488 11712 org.opensearch.securityanalytics.model.LogType$Mapping
526: 486 11664 org.opensearch.common.inject.DefaultConstructionProxyFactory$1
527: 486 11664 org.opensearch.common.inject.spi.InjectionPoint
528: 653 11576 [Lsun.reflect.generics.tree.ClassTypeSignature; (java.base@21.0.3)
529: 481 11544 java.util.regex.Pattern$$Lambda/0x80000002b (java.base@21.0.3)
530: 239 11472 org.opensearch.common.settings.IndexScopedSettings
531: 237 11376 java.util.concurrent.ArrayBlockingQueue (java.base@21.0.3)
532: 237 11376 org.apache.lucene.index.CompositeReaderContext
533: 531 11360 [Lsun.reflect.generics.tree.FieldTypeSignature; (java.base@21.0.3)
534: 236 11328 org.apache.lucene.index.BufferedUpdates
535: 236 11328 org.apache.lucene.index.DocumentsWriter$$Lambda/0x00007f21510cae38
536: 236 11328 org.apache.lucene.index.IndexFileDeleter$CommitPoint
537: 472 11328 org.opensearch.core.common.text.Text
538: 235 11280 org.apache.lucene.index.SoftDeletesRetentionMergePolicy
539: 235 11280 org.apache.lucene.store.MMapDirectory
540: 470 11280 org.opensearch.common.util.concurrent.KeyedLock
541: 235 11280 org.opensearch.index.IndexService$AsyncGlobalCheckpointTask
542: 235 11280 org.opensearch.index.IndexService$AsyncRefreshTask
543: 235 11280 org.opensearch.index.IndexService$AsyncRetentionLeaseSyncTask
544: 235 11280 org.opensearch.index.IndexService$AsyncTrimTranslogTask
545: 470 11280 org.opensearch.index.analysis.FieldNameAnalyzer
546: 470 11280 org.opensearch.index.engine.LiveVersionMap$Maps
547: 235 11280 org.opensearch.index.engine.SoftDeletesPolicy
548: 235 11280 org.opensearch.index.fielddata.IndexFieldDataService
549: 235 11280 org.opensearch.index.get.ShardGetService
550: 235 11280 org.opensearch.index.mapper.DocumentMapperParser
551: 470 11280 org.opensearch.index.mapper.DocumentParser
552: 470 11280 org.opensearch.index.mapper.DynamicKeyFieldTypeLookup
553: 235 11280 org.opensearch.index.store.DirectoryFileTransferTracker
554: 235 11280 org.opensearch.index.store.FsDirectoryFactory$HybridDirectory
555: 235 11280 org.opensearch.index.translog.InternalTranslogManager
556: 235 11280 org.opensearch.indices.replication.common.ReplicationTimer
557: 343 10976 com.fasterxml.jackson.databind.util.internal.PrivateMaxEntriesMap$Node
558: 274 10960 java.lang.ref.Finalizer (java.base@21.0.3)
559: 341 10912 jdk.internal.math.FDBigInteger (java.base@21.0.3)
560: 341 10912 org.opensearch.common.settings.Setting$IntegerParser
561: 135 10800 java.net.URI (java.base@21.0.3)
562: 270 10800 org.opensearch.common.util.concurrent.BaseFuture$Sync
563: 446 10704 io.netty.util.internal.logging.LocationAwareSlf4JLogger
564: 446 10704 java.util.concurrent.ConcurrentSkipListMap$Node (java.base@21.0.3)
565: 264 10560 sun.security.util.KnownOIDs (java.base@21.0.3)
566: 656 10496 sun.nio.fs.UnixFileAttributes$UnixAsBasicFileAttributes (java.base@21.0.3)
567: 187 10472 java.lang.invoke.BoundMethodHandle$Species_LLLLLL (java.base@21.0.3)
568: 327 10464 sun.reflect.generics.reflectiveObjects.TypeVariableImpl (java.base@21.0.3)
569: 433 10392 org.opensearch.common.inject.InjectorImpl$4
570: 433 10392 org.opensearch.common.inject.multibindings.MapBinder$RealMapBinder$MapEntry
571: 433 10392 org.opensearch.common.inject.spi.ProviderLookup
572: 433 10392 org.opensearch.plugins.ActionPlugin$ActionHandler
573: 323 10336 org.apache.lucene.analysis.ReusableStringReader
574: 430 10320 org.apache.lucene.util.packed.DirectReader$DirectPackedReader2
575: 416 10248 [Lorg.opensearch.sql.data.type.ExprType;
576: 638 10208 java.net.InetSocketAddress (java.base@21.0.3)
577: 423 10152 org.opensearch.core.xcontent.ObjectParser$$Lambda/0x00007f2150487068
578: 408 10120 [Lorg.opensearch.common.settings.Setting$Property;
579: 253 10120 org.opensearch.indices.replication.common.ReplicationCollection$ReplicationMonitor
580: 417 10008 java.time.format.DateTimeFormatterBuilder$CompositePrinterParser (java.base@21.0.3)
581: 250 10000 org.opensearch.common.settings.Setting$1
582: 415 9960 [Lio.netty.util.concurrent.GenericFutureListener;
583: 415 9960 io.netty.util.concurrent.DefaultFutureListeners
584: 414 9936 org.opensearch.sql.expression.function.FunctionDSL$$Lambda/0x00007f2150b36428
585: 412 9888 java.util.ImmutableCollections$SetN (java.base@21.0.3)
586: 411 9864 [Ljava.nio.channels.SelectionKey; (java.base@21.0.3)
587: 615 9840 sun.security.ssl.SessionId (java.base@21.0.3)
588: 408 9792 java.io.ByteArrayOutputStream (java.base@21.0.3)
589: 402 9640 [Lio.netty.util.DefaultAttributeMap$DefaultAttribute;
590: 401 9624 io.netty.channel.SucceededChannelFuture
591: 401 9624 io.netty.util.DefaultAttributeMap$DefaultAttribute
592: 401 9624 java.util.regex.Pattern$$Lambda/0x800000033 (java.base@21.0.3)
593: 399 9576 io.netty.channel.PendingBytesTracker$DefaultChannelPipelinePendingBytesTracker
594: 399 9576 org.opensearch.transport.CopyBytesSocketChannel$WriteConfig
595: 399 9576 org.opensearch.transport.TcpChannel$ChannelStats
596: 171 9576 sun.security.jca.ProviderList$ServiceList (java.base@21.0.3)
597: 399 9576 sun.security.ssl.HandshakeHash (java.base@21.0.3)
598: 399 9576 sun.security.ssl.SSLEngineOutputRecord$HandshakeFragment (java.base@21.0.3)
599: 239 9560 org.apache.lucene.util.packed.Packed64
600: 396 9504 java.security.Permissions (java.base@21.0.3)
601: 237 9480 org.opensearch.index.OpenSearchTieredMergePolicy
602: 237 9480 sun.nio.ch.FileLockImpl (java.base@21.0.3)
603: 169 9464 java.lang.Module (java.base@21.0.3)
604: 236 9440 org.apache.lucene.index.BufferedUpdatesStream
605: 236 9440 org.apache.lucene.util.ByteBlockPool
606: 235 9400 org.opensearch.index.cache.bitset.BitsetFilterCache
607: 235 9400 org.opensearch.index.engine.InternalEngine$ExternalReaderManager
608: 235 9400 org.opensearch.index.engine.PrunePostingsMergePolicy
609: 235 9400 org.opensearch.index.engine.RecoverySourcePruneMergePolicy
610: 235 9400 org.opensearch.index.mapper.DataStreamFieldMapper
611: 235 9400 org.opensearch.index.mapper.FieldNamesFieldMapper
612: 235 9400 org.opensearch.index.mapper.FieldNamesFieldMapper$FieldNamesFieldType
613: 235 9400 org.opensearch.index.mapper.IdFieldMapper$IdFieldType
614: 235 9400 org.opensearch.index.mapper.Mapper$TypeParser$ParserContext
615: 235 9400 org.opensearch.index.mapper.RoutingFieldMapper
616: 235 9400 org.opensearch.index.mapper.SourceFieldMapper$SourceFieldType
617: 235 9400 org.opensearch.index.shard.GlobalCheckpointListeners
618: 235 9400 org.opensearch.index.shard.IndexShard$7
619: 235 9400 org.opensearch.index.shard.IndexShardOperationPermits
620: 235 9400 org.opensearch.index.shard.InternalIndexingStats$StatsHolder
621: 235 9400 org.opensearch.index.shard.OpenSearchMergePolicy
622: 235 9400 org.opensearch.index.store.ByteSizeCachingDirectory
623: 235 9400 org.opensearch.index.store.ByteSizeCachingDirectory$1
624: 235 9400 org.opensearch.index.translog.TranslogConfig
625: 16 9216 io.netty.util.internal.shaded.org.jctools.queues.atomic.MpscUnboundedAtomicArrayQueue
626: 383 9192 org.opensearch.sql.expression.function.FunctionDSL$$Lambda/0x00007f2150b24690
627: 287 9184 java.util.regex.Pattern$BnM (java.base@21.0.3)
628: 104 9152 com.fasterxml.jackson.databind.ser.BeanPropertyWriter
629: 570 9120 org.apache.lucene.util.BytesRefBuilder
630: 190 9120 org.jcodings.util.CaseInsensitiveBytesHash$CaseInsensitiveBytesHashEntry
631: 283 9056 java.nio.file.attribute.FileTime (java.base@21.0.3)
632: 377 9048 java.lang.module.ModuleDescriptor$Exports (java.base@21.0.3)
633: 112 8960 org.apache.logging.log4j.core.util.datetime.FixedDateFormat
634: 272 8704 org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable
635: 542 8672 sun.reflect.generics.tree.TypeVariableSignature (java.base@21.0.3)
636: 83 8632 org.opensearch.security.configuration.DlsFlsFilterLeafReader
637: 267 8544 java.time.format.DateTimeFormatterBuilder$NumberPrinterParser (java.base@21.0.3)
638: 265 8480 com.google.common.collect.RegularImmutableSet
639: 265 8480 java.util.concurrent.CountDownLatch$Sync (java.base@21.0.3)
640: 353 8472 java.io.FilePermissionCollection (java.base@21.0.3)
641: 526 8416 org.apache.logging.log4j.message.ReusableSimpleMessage
642: 208 8320 sun.security.pkcs11.SunPKCS11$Descriptor (jdk.crypto.cryptoki@21.0.3)
643: 343 8232 com.fasterxml.jackson.databind.util.internal.PrivateMaxEntriesMap$WeightedValue
644: 256 8192 io.netty.handler.codec.CodecOutputList
645: 341 8184 java.util.regex.Pattern$Ques (java.base@21.0.3)
646: 332 7968 sun.reflect.generics.tree.FormalTypeParameter (java.base@21.0.3)
647: 329 7896 org.opensearch.common.collect.Tuple
648: 488 7808 java.util.regex.Pattern$$Lambda/0x800000032 (java.base@21.0.3)
649: 486 7776 org.opensearch.common.inject.ConstructorBindingImpl$Factory
650: 486 7776 org.opensearch.common.inject.DefaultConstructionProxyFactory
651: 323 7752 [Lorg.apache.lucene.util.AttributeSource$State;
652: 323 7752 org.apache.lucene.analysis.Analyzer$TokenStreamComponents
653: 322 7728 java.util.regex.Pattern$CharProperty (java.base@21.0.3)
654: 192 7680 java.lang.Thread$FieldHolder (java.base@21.0.3)
655: 160 7680 java.lang.invoke.BoundMethodHandle$Species_LLL (java.base@21.0.3)
656: 160 7680 java.net.URLPermission (java.base@21.0.3)
657: 318 7632 java.util.ImmutableCollections$Set12 (java.base@21.0.3)
658: 237 7584 java.util.concurrent.Semaphore$FairSync (java.base@21.0.3)
659: 237 7584 org.apache.lucene.store.NativeFSLockFactory$NativeFSLock
660: 237 7584 sun.nio.ch.FileKey (java.base@21.0.3)
661: 237 7584 sun.nio.ch.FileLockTable$FileLockReference (java.base@21.0.3)
662: 470 7552 [Lorg.opensearch.index.mapper.DynamicTemplate;
663: 118 7552 java.lang.invoke.BoundMethodHandle$Species_LLLLLLLL (java.base@21.0.3)
664: 472 7552 jdk.proxy2.$Proxy32 (jdk.proxy2)
665: 236 7552 org.apache.lucene.index.BufferedUpdates$DeletedTerms
666: 236 7552 org.apache.lucene.index.BufferedUpdatesStream$FinishedSegments
667: 236 7552 org.apache.lucene.index.DocumentsWriterPerThreadPool
668: 236 7552 org.apache.lucene.index.IndexWriter$EventQueue
669: 472 7552 org.apache.lucene.util.Counter$AtomicCounter
670: 236 7552 org.apache.lucene.util.FrequencyTrackingRingBuffer
671: 236 7552 org.opensearch.index.seqno.RetentionLeases
672: 235 7520 org.apache.lucene.document.NumericDocValuesField
673: 235 7520 org.apache.lucene.index.ShuffleForcedMergePolicy
674: 235 7520 org.opensearch.action.support.replication.PendingReplicationActions
675: 235 7520 org.opensearch.analysis.common.FingerprintAnalyzer
676: 470 7520 org.opensearch.common.concurrent.CompletableContext$$Lambda/0x00007f21510fe008
677: 470 7520 org.opensearch.common.settings.Setting$$Lambda/0x00007f21500be090
678: 470 7520 org.opensearch.core.action.ActionListener$$Lambda/0x00007f21510fd518
679: 235 7520 org.opensearch.env.NodeEnvironment$1
680: 235 7520 org.opensearch.env.NodeEnvironment$InternalShardLock
681: 235 7520 org.opensearch.index.cache.IndexCache
682: 235 7520 org.opensearch.index.cache.bitset.ShardBitsetFilterCache
683: 235 7520 org.opensearch.index.cache.request.ShardRequestCache
684: 235 7520 org.opensearch.index.engine.Engine$IndexThrottle
685: 235 7520 org.opensearch.index.engine.LiveVersionMap
686: 235 7520 org.opensearch.index.mapper.DocCountFieldMapper
687: 235 7520 org.opensearch.index.mapper.IdFieldMapper
688: 235 7520 org.opensearch.index.mapper.IgnoredFieldMapper
689: 235 7520 org.opensearch.index.mapper.IndexFieldMapper
690: 235 7520 org.opensearch.index.mapper.RankFeatureMetaFieldMapper
691: 235 7520 org.opensearch.index.mapper.SeqNoFieldMapper
692: 235 7520 org.opensearch.index.mapper.VersionFieldMapper
693: 235 7520 org.opensearch.index.seqno.LocalCheckpointTracker
694: 235 7520 org.opensearch.index.shard.IndexShard$RefreshMetricUpdater
695: 235 7520 org.opensearch.index.shard.ShardPath
696: 235 7520 org.opensearch.index.similarity.SimilarityService
697: 235 7520 org.opensearch.index.store.ByteSizeCachingDirectory$SizeAndModCount
698: 235 7520 org.opensearch.index.translog.TranslogHeader
699: 235 7520 org.opensearch.index.warmer.ShardIndexWarmerService
700: 233 7456 org.apache.logging.log4j.core.config.plugins.processor.PluginEntry
701: 154 7392 org.apache.lucene.analysis.CharArrayMap$UnmodifiableCharArrayMap
702: 2 7320 [Ljava.lang.Character$UnicodeScript; (java.base@21.0.3)
703: 130 7280 sun.util.calendar.ZoneInfo (java.base@21.0.3)
704: 452 7232 org.opensearch.core.action.ActionListener$$Lambda/0x00007f21502941f8
705: 452 7232 org.opensearch.core.action.ActionListener$$Lambda/0x00007f2150294418
706: 450 7200 org.apache.lucene.analysis.CharArraySet
707: 297 7128 javax.management.ImmutableDescriptor (java.management@21.0.3)
708: 295 7080 java.util.regex.Pattern$StartS (java.base@21.0.3)
709: 433 6928 org.opensearch.common.inject.spi.ProviderLookup$ProviderImpl
710: 2 6912 [Lorg.jcodings.unicode.UnicodeCodeRange;
711: 123 6888 java.net.SocketPermission (java.base@21.0.3)
712: 215 6880 sun.nio.fs.UnixFileKey (java.base@21.0.3)
713: 428 6848 org.opensearch.action.support.HandledTransportAction$TransportHandler
714: 48 6792 [Ljava.lang.ClassValue$Entry; (java.base@21.0.3)
715: 283 6792 org.opensearch.common.settings.Setting$$Lambda/0x00007f2150090238
716: 277 6752 [Lcom.fasterxml.jackson.databind.JavaType;
717: 204 6528 org.opensearch.sql.expression.function.BuiltinFunctionName
718: 401 6416 io.netty.channel.VoidChannelPromise$1
719: 401 6416 io.netty.channel.nio.AbstractNioChannel$1
720: 401 6416 org.opensearch.transport.netty4.Netty4TcpChannel$$Lambda/0x00007f2151081a00
721: 400 6400 io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle$1
722: 399 6384 io.netty.channel.nio.AbstractNioByteChannel$1
723: 266 6384 org.opensearch.core.common.io.stream.NamedWriteableRegistry$Entry
724: 399 6384 org.opensearch.transport.InboundAggregator$$Lambda/0x00007f2151141420
725: 399 6384 org.opensearch.transport.TcpTransport$$Lambda/0x00007f21510ffc50
726: 399 6384 org.opensearch.transport.netty4.Netty4MessageChannelHandler$$Lambda/0x00007f21510fa760
727: 399 6384 org.opensearch.transport.netty4.Netty4MessageChannelHandler$$Lambda/0x00007f21510fb420
728: 399 6384 org.opensearch.transport.netty4.Netty4MessageChannelHandler$$Lambda/0x00007f2151144240
729: 399 6384 org.opensearch.transport.netty4.Netty4Transport$$Lambda/0x00007f21510f8240
730: 399 6384 sun.security.ssl.HandshakeHash$CacheOnlyHash (java.base@21.0.3)
731: 397 6352 org.opensearch.sql.expression.function.FunctionDSL$$Lambda/0x00007f2150b353f8
732: 158 6320 sun.reflect.generics.repository.MethodRepository (java.base@21.0.3)
733: 195 6240 com.sun.jmx.mbeanserver.ConvertingMethod (java.management@21.0.3)
734: 254 6096 org.opensearch.security.support.WildcardMatcher$MatcherCombiner
735: 253 6072 com.google.common.collect.ImmutableMapEntry
736: 249 5976 org.opensearch.index.fielddata.plain.SortedSetBytesLeafFieldData
737: 106 5936 java.nio.HeapCharBuffer (java.base@21.0.3)
738: 185 5920 org.opensearch.painless.lookup.PainlessConstructor
739: 41 5904 org.opensearch.repositories.s3.async.SizeBasedBlockingQ$Consumer
740: 3 5744 [[Lorg.opensearch.search.aggregations.bucket.terms.StringTerms$Bucket;
741: 237 5688 org.apache.lucene.search.similarities.BM25Similarity
742: 237 5688 org.opensearch.index.LogByteSizeMergePolicyProvider
743: 237 5688 org.opensearch.index.MergeSchedulerConfig
744: 237 5688 org.opensearch.index.TieredMergePolicyProvider
745: 237 5688 org.opensearch.index.remote.RemoteStorePathStrategy
746: 237 5688 sun.nio.ch.FileLockTable (java.base@21.0.3)
747: 236 5664 org.apache.lucene.index.ConcurrentApproximatePriorityQueue
748: 236 5664 org.apache.lucene.index.DocumentsWriterDeleteQueue$DeleteSlice
...
Total 26089884 33026508368
Also, I'm assuming you are not running any painless scripts.
No, the only thing running in the cluster is this query which is basically nested terms aggregation (with huge sizes) + date histogram.
@Pigueiras , we'll need to dig deeper. This is a good start. But we'll need multiple histogram dumps at regular intervals to see what is growing rapidly. Looking at the metrics shared here (https://github.com/opensearch-project/OpenSearch/issues/15413#issuecomment-2323055937) , my guess is that the backpressure module has a very narrow window of time to detect this. Adding @kaushalmahi12 and @sgup432 to see if they have some tuning suggestions after they're done with the 2.17 release items. In the meantime, it might be a good idea to take histogram and hot_thread dumps every fixed interval so that we can see between successive increases, what new objects are rapidly accumulating and what operations were actively being run.
@kkhatua I created this repository with a dump of the histogram and hot_thread approximately every second from the beginning of a query until the OOME. The files are in HHMMSS.sss format, and the OOME occurs at 22:13:44.
This will take some time, @Pigueiras Looking at the metrics by just raw Java objects... there is a clear growth in allocation but not a proportional grown in the number of instances:
===221250.998===
num #instances #bytes class name (module)
-------------------------------------------------------
1: 9645603 555071536 [B (java.base@21.0.3)
2: 20673 239992792 [Ljdk.internal.vm.FillerElement; (java.base@21.0.3)
3: 778204 165436520 [J (java.base@21.0.3)
===221310.010===
num #instances #bytes class name (module)
-------------------------------------------------------
1: 167791 8060421456 [Ljava.lang.Object; (java.base@21.0.3)
2: 13020 807425672 [Ljdk.internal.vm.FillerElement; (java.base@21.0.3)
3: 5477994 284541896 [B (java.base@21.0.3)
===221320.062===
num #instances #bytes class name (module)
-------------------------------------------------------
1: 163015 16527807568 [Ljava.lang.Object; (java.base@21.0.3)
2: 20182 1195993792 [Ljdk.internal.vm.FillerElement; (java.base@21.0.3)
3: 5474736 282744968 [B (java.base@21.0.3)
===221330.131===
num #instances #bytes class name (module)
-------------------------------------------------------
1: 164115 24883669456 [Ljava.lang.Object; (java.base@21.0.3)
2: 24545 1569997016 [Ljdk.internal.vm.FillerElement; (java.base@21.0.3)
3: 5478403 282984872 [B (java.base@21.0.3)
===221340.938===
num #instances #bytes class name (module)
-------------------------------------------------------
1: 148297 30195499744 [Ljava.lang.Object; (java.base@21.0.3)
2: 5448453 280126496 [B (java.base@21.0.3)
3: 631937 148145208 [J (java.base@21.0.3)
From the few usable hot_threads
, the stack shows sub aggregations being executed in nested calls:
...
app//org.opensearch.search.aggregations.bucket.BucketsAggregator.buildSubAggsForBuckets(BucketsAggregator.java:220)
app//org.opensearch.search.aggregations.bucket.BucketsAggregator.buildSubAggsForAllBuckets(BucketsAggregator.java:286)
...
Still unclear why isn't the Search Backpressure (SBP) module detecting this from the time it sees the allocations climb rapidly from 221310.010
to the last sample at 221340.93
(~30 sec). It might have to do with what is the sampling rate of the SBP to look behind before assessing that the node is in duress.
I believe you've already set this search_backpressure.node_duress.num_successive_breaches
and can consider lowering search_backpressure.node_duress.heap_threshold
. Will wait for others to chime in.
This will take some time
No problem, I completely understand that I have only one issue and you have to handle many. Don’t feel obligated to answer quickly if I do 👍
I believe you've already set this search_backpressure.node_duress.num_successive_breaches and can consider lowering search_backpressure.node_duress.heap_threshold. Will wait for others to chime in.
I have tried with
PUT _cluster/settings
{
"transient": {
"search_backpressure": {
"node_duress": {
"heap_threshold": "0.0001"
}
}
}
}
So the nodes are always considered under duress, yet I cannot see any logs about backpressure. The only ones that appear related to SBP are the ones when the data node starts and changes the default values:
Describe the bug
We have a cluster with 12 data nodes and 31 GB reserved for the JVM. We were experiencing sporadic Out of Memory errors and managed to isolate the issue to some dashboards that were using nested aggregations with arbitrarily large sizes. We tried different approaches to terminate these client searches before they could crash some of the nodes in the cluster, but none of them worked (as described below).
The query running behind the scenes in Grafana/Dashboards was something similar to:
We tried the following settings in our cluster:
default_search_timeout
andcancel_after_time_interval
don’t have any effect. You can see this in the task monitoring:For example, it runs for 2-3 minutes before crashing the data nodes:
If you try to kill the tasks manually with
_tasks/node:task/_cancel
the cluster simply ignores it.Circuitbreakers settings (
indices.breaker.request.limit
,indices.breaker.request.overhead
, ...) are designed to prevent out-of-memory errors by estimating the memory usage of requests. However, it doesn't look like OpenSearch is taking into account these aggregations to estimate the memory usage accurately in advance, leading to the query being accepted even if it eventually consumes a lot of memory.Backpressure is triggered, but it never actually kills the problematic query. The message about “heap usage not dominated by search requests” makes me think that aggregations follow a completely different workflow in memory usage tracking in OpenSearch, which is why they are not handled by the circuit breakers or backpressure mechanisms.
max_buckets
doesn’t seem to have an effect because it is only triggered in the reduce phase. Only if the "size" of the aggregation is reasonable and OpenSearch can compute the query, then you can hit the limit…We've run out of ideas, so please let us know if there's something really missing from OpenSearch or if you have any other suggestions to try. We would appreciate it! 😄
Related component
Search:Aggregations
To Reproduce
Expected behavior
cancel_query_after_time
worked, it would be very useful. If a query takes more than 30 seconds, something is likely wrong. A query of this type was taking more than 2 minutes before it could kill some data nodes in the cluster.Additional Details
Plugins opensearch-alerting opensearch-anomaly-detection opensearch-asynchronous-search opensearch-cross-cluster-replication opensearch-custom-codecs opensearch-flow-framework opensearch-geospatial opensearch-index-management opensearch-job-scheduler opensearch-knn opensearch-ml opensearch-neural-search opensearch-notifications opensearch-notifications-core opensearch-observability opensearch-performance-analyzer opensearch-reports-scheduler opensearch-security opensearch-security-analytics opensearch-skills opensearch-sql repository-s3
Host/Environment: