opensearch-project / security-analytics

Security Analytics enables users for detecting security threats on their security event log data. It will also allow them to modify/tailor the pre-packaged solution.
Apache License 2.0
72 stars 74 forks source link

[BUG] ConcurrentModificationException seen in logs during the startup of a brand new node in a docker container #1414

Open andrross opened 2 weeks ago

andrross commented 2 weeks ago

What is the bug? ConcurrentModificationException seen in logs during the startup of a brand new node in a docker container:

[2024-11-08T16:27:31,961][WARN ][o.o.c.s.ClusterApplierService] [c2e68ef6aa29] failed to notify ClusterStateListener
java.util.ConcurrentModificationException: null
    at java.base/java.util.ArrayList$Itr.checkForComodification(ArrayList.java:1095) ~[?:?]
    at java.base/java.util.ArrayList$Itr.next(ArrayList.java:1049) ~[?:?]
    at org.opensearch.securityanalytics.indexmanagment.DetectorIndexManagementService.clusterChanged(DetectorIndexManagementService.java:271) ~[?:?]
    at org.opensearch.cluster.service.ClusterApplierService.callClusterStateListener(ClusterApplierService.java:655) [opensearch-2.18.0.jar:2.18.0]
    at org.opensearch.cluster.service.ClusterApplierService.callClusterStateListeners(ClusterApplierService.java:641) [opensearch-2.18.0.jar:2.18.0]
    at org.opensearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:599) [opensearch-2.18.0.jar:2.18.0]
    at org.opensearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:503) [opensearch-2.18.0.jar:2.18.0]
    at org.opensearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:205) [opensearch-2.18.0.jar:2.18.0]
    at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:946) [opensearch-2.18.0.jar:2.18.0]
    at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:283) [opensearch-2.18.0.jar:2.18.0]
    at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:246) [opensearch-2.18.0.jar:2.18.0]
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]

How can one reproduce the bug? I simply ran:

docker run -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" -e 'OPENSEARCH_INITIAL_ADMIN_PASSWORD=<password>' opensearchproject/opensearch:2.18.0

and saw the exception in the stdout logging during startup. I had not yet even interacted with the server. However, I haven't been able to reproduce it so I suspect it is a race condition.

Looking at the code in question, the cluster state apply method (clusterChanged()) is iterating over an ArrayList making the assumption that the list will not be modified by another thread. This exception suggests that assumption is wrong.