opensearch-project / security-analytics

Security Analytics enables users for detecting security threats on their security event log data. It will also allow them to modify/tailor the pre-packaged solution.
Apache License 2.0
71 stars 72 forks source link

[BUG] Custom Codec Plugin breaking Security Analytics Plugin Alerts #1050

Closed Vejur closed 3 months ago

Vejur commented 4 months ago

Describe the bug

We currently have three fresh installations of OpenSearch 2.13.0. When we configure a detector with corresponding alerts, alerting will only work in the first seconds after 0:00 UTC. At the same time, we can see several of those log entries:

[2024-06-03T00:00:48,763][ERROR][o.o.s.u.SecurityAnalyticsException] [siem-2.*.*.*.dev] Security Analytics error:
java.lang.IllegalStateException: existing codec service factory already overridden in: org.opensearch.index.codec.customcodecs.CustomCodecPlugin attempting to override again by: org.opensearch.securityanalytics.SecurityAnalyticsPlugin
        at org.opensearch.index.engine.EngineConfigFactory.<init>(EngineConfigFactory.java:109) ~[opensearch-2.13.0.jar:2.13.0]
        at org.opensearch.index.engine.EngineConfigFactory.<init>(EngineConfigFactory.java:65) ~[opensearch-2.13.0.jar:2.13.0]
        at org.opensearch.indices.IndicesService.getEngineConfigFactory(IndicesService.java:907) ~[opensearch-2.13.0.jar:2.13.0]
        at org.opensearch.indices.IndicesService.createIndexService(IndicesService.java:868) ~[opensearch-2.13.0.jar:2.13.0]
        at org.opensearch.indices.IndicesService.withTempIndexService(IndicesService.java:823) ~[opensearch-2.13.0.jar:2.13.0]
        at org.opensearch.cluster.metadata.MetadataCreateIndexService.applyCreateIndexWithTemporaryService(MetadataCreateIndexService.java:483) ~[opensearch-2.13.0.jar:2.13.0]
        at org.opensearch.cluster.metadata.MetadataCreateIndexService.applyCreateIndexRequestWithV2Template(MetadataCreateIndexService.java:653) ~[opensearch-2.13.0.jar:2.13.0]
        at org.opensearch.cluster.metadata.MetadataCreateIndexService.applyCreateIndexRequest(MetadataCreateIndexService.java:426) ~[opensearch-2.13.0.jar:2.13.0]
        at org.opensearch.cluster.metadata.MetadataCreateIndexService.applyCreateIndexRequest(MetadataCreateIndexService.java:452) ~[opensearch-2.13.0.jar:2.13.0]
        at org.opensearch.cluster.metadata.MetadataCreateIndexService$1.execute(MetadataCreateIndexService.java:358) ~[opensearch-2.13.0.jar:2.13.0]
        at org.opensearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:67) ~[opensearch-2.13.0.jar:2.13.0]
        at org.opensearch.cluster.service.MasterService.executeTasks(MasterService.java:882) ~[opensearch-2.13.0.jar:2.13.0]
        at org.opensearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:434) ~[opensearch-2.13.0.jar:2.13.0]
        at org.opensearch.cluster.service.MasterService.runTasks(MasterService.java:301) ~[opensearch-2.13.0.jar:2.13.0]
        at org.opensearch.cluster.service.MasterService$Batcher.run(MasterService.java:212) ~[opensearch-2.13.0.jar:2.13.0]
        at org.opensearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:204) ~[opensearch-2.13.0.jar:2.13.0]
        at org.opensearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:242) ~[opensearch-2.13.0.jar:2.13.0]
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:854) ~[opensearch-2.13.0.jar:2.13.0]
        at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:283) ~[opensearch-2.13.0.jar:2.13.0]
        at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:246) ~[opensearch-2.13.0.jar:2.13.0]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
        at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]

Those errors will stop after a few seconds. There will be no more alerts on that day from the security plugin until the next night at 0:00 UTC, when the problem occurs again.

As a workaround, we have de-installed the Custom Codecs Plugin, which fixed the issue. The problem here is, that in the standard packages, this Plugin will be re-installed on upgrades. So we fear that the issue will continue in the future.

I have also found this issue, which might hint on a similar problem: https://github.com/opensearch-project/OpenSearch/issues/7012

Related component

Plugins

To Reproduce

  1. On a fresh install of OpenSearch (2.13.0), create a Detector with Alerting in the Security Analytics Plugin. Configure it to run every minute.
  2. After one day, alerting will stop. Have a look at the OpenSearch logfile in /var/log/opensearch to find the corresponding error message, hinting on a conflict between the Custom Codecs plugin and the Security Analytics Plugin.

Expected behavior

The Custom Codec Plugin should not interfere with the functionality of the Security Analytics Plugin.

Additional Details

Plugins opensearch-alerting opensearch-anomaly-detection opensearch-asynchronous-search opensearch-cross-cluster-replication opensearch-custom-codecs opensearch-flow-framework opensearch-geospatial opensearch-index-management opensearch-job-scheduler opensearch-knn opensearch-ml opensearch-neural-search opensearch-notifications opensearch-notifications-core opensearch-observability opensearch-performance-analyzer opensearch-reports-scheduler opensearch-security opensearch-security-analytics opensearch-skills opensearch-sql prometheus-exporter repository-s3

Host/Environment (please complete the following information):

dblock commented 4 months ago

I think this is more a security analytics plugin problem, moving it there.

sbcd90 commented 4 months ago

a pr is available to fix this issue: https://github.com/opensearch-project/security-analytics/pull/1047

dblock commented 3 months ago

[Catch All Triage - Attendees 1, 2, 3, 4]

Looks like it was fixed in #1047, closing. Please reopen if you see any other issues.

Vejur commented 2 months ago

Can confirm that the bug is fixed in 2.15.0. Thank you!