[RFC] Security Analytics Correlation Engine

sbcd90 commented 1 year ago

Introduction

Security Analytics is an open-source solution for security operations in OpenSearch. Security Analytics’ threat detection engine converts the detection rules into executable OpenSearch queries which are then matched against the logs or events ingested by the user to generate findings. The trigger condition filters are further applied on the findings to generate alerts.

Problem Statement

Today in Security Analytics, the generated findings & alerts belong to individual log types & there is no way to automatically correlate between them. As customers data spans across multiple security event logs (s3 access, VPC flow, sys log, DNS ), a finding on just one log source is not enough to increase the confidence of the finding and moreover a strong correlation across logs helps customers to dive into the relationship of data across different sources. In order to understand this correlation across findings from different log sources, customers would manually need to browse through the list of findings generated for individual log categories & then need to identify the correlated patterns manually.

Here is an example.

Example Infrastructure

In this sample customer infrastructure diagram shown below, the customer has a Django REST application which is hosted on a EC2 Windows instance. The REST apis use Active Directory as identity Provider & the Django Application uses S3 to store & query the files. The incoming network traffic logs to EC2 Windows instance are also stored as VPC Flow Logs.

Security Analytics detectors generate findings for a threat

In order for Security Analytics to monitor & detect threats for the above customer infrastructure , we need to define a Security Analytics Threat Detector for each component in the infrastructure. For example, the below diagram shows that Network Detector is defined for VPC Flow logs ,Windows Detector is defined for EC2 Windows instance & so on.

image (1)

Now, lets try to simulate a security attack on this infrastructure. In this attack, the attacker uses sbcd90 user to call a REST api named POST /customer_records.txt which tries to replicate a sensitive file named customer_records.txt from S3.

In the above diagram, we show that if such an attack happens, each Threat Detector which is monitoring its corresponding infrastructure component generates a finding . For example, the AD/Ldap Detector generates a finding that an Invalid Username/Password, ResultType: 50126 finding is generated & so on.

These findings as shown in the diagram are generated by individual detectors & they belong to their respective log types. But, how does the customer know that the Ad/Ldap Detector finding of Invalid Username/Password, ResultType: 50126 is related to a chain of security events occurring around the same time window on the infrastructure(say the 403 Forbidden error finding from Applications detector)?

One way is to manually correlate this finding with list of findings belonging to other log types within a particular time range. Can we possibly solve this problem automatically?

Proposed Solution

The Security Analytics Correlation Engine provides an approach to solve this issue by allowing the customers to define different threat scenarios that can be identified from the logs generated from the individual systems in their infrastructure exactly once & then generating correlations between findings from different log categories automatically.

Correlation Engine is a Security Finding Knowledge Graph which can be used to store connected findings data & generate correlated insights(as well as correlated historical insights) based on time windows from them .

Correlation Engine Feature Scope

Customer can define the most relevant threat scenarios between logs of different systems in their infrastructure as correlation rules using simple sql-like queries. Here is an example.

If we want to define a threat scenario that can identify 403 Forbidden error findings generated by application detector on a set of windows hosts with ip range 4.5.6.* , we can define it as follows:

"field": {
  "windows": "host:4.5.6.*",
  "application": "status:403"
},
"query": true

Thus, this threat scenario connects application logs with windows logs in a particular scenario. Similarly, customer can define several threat scenarios for different systems in their infrastructure based on their requirements.

These threat scenarios are then used by the Correlation Engine to define a graph of correlated findings.

Correlation Engine then can generate nearby findings to a particular finding, thus correlating findings, logs & rules across log categories. Here is an example correlation generated for the example infrastructure described above in the diagram.

GET /_plugins/_security_analytics/findings/correlate?finding=05e75ff0-4ae9-44bd-805f-893559e9fa62&detector_type=windows&time_window=120000&nearby_findings=20

{
    "findings": [
        {
            "finding": "8bf20320-a2bc-433a-a1a4-5fda16ed6875",
            "detector_type": "ad_ldap",
            "score": 1.7824930864662747E-6
        },
        {
            "finding": "52a024ba-c423-42e5-b97c-1781a875940c",
            "detector_type": "s3",
            "score": 1.6266511011053808E-5
        },
        {
            "finding": "30cc64a7-13dd-4ec4-a2bd-737ed3c80578",
            "detector_type": "others_application",
            "score": 1.6309222701238468E-5
        },
        {
            "finding": "4f20bb77-ac05-4d74-87b8-16386292d89f",
            "detector_type": "network",
            "score": 8.688701200298965E-6
        },
        {
            "finding": "e1a40ae5-70aa-4b28-a02c-9b59074499b8",
            "detector_type": "ad_ldap",
            "score": 8.688701200298965E-6
        },
        {
            "finding": "8a1678a0-8342-4734-b6ea-17dfcda9174e",
            "detector_type": "windows",
            "score": 8.07421838544542E-6
        },
        {
            "finding": "41c6a383-d0e3-4f32-b83e-ca6d927c2067",
            "detector_type": "network",
            "score": 1.7824930864662747E-6
        }
    ]
}

The scores determine the proximity of each relevant(identified from threat scenarios defined by customer) finding from the windows finding in query 05e75ff0-4ae9-44bd-805f-893559e9fa62 within the time window of 2 minutes.

Building Blocks

The Detectors in Security Analytics internally creates Monitors in Alerting which runs periodic jobs against the infrastructure logs generated from each component in the customer infrastructure. When these logs match the rules, findings are generated in Alerting.

Once a finding is generated, an asynchronous(fire & forget) transport layer call is made to the Correlation Engine to correlate this new finding with existing findings. This new finding & its correlations are then stored in the HNSW Graph(or Vector storage).

image (2)

Correlation Engine internals

The internals of the Correlation Engine is composed of 4 major components.

HNSW Graph based vector storage - this is HNSW Graph based storage used to store all finding vectors & query them at the vector level.
Insertion Algorithm - the most important piece of the Correlation Engine is its insertion algorithm. In this layer, findings are converted to k-dimensional vectors & are stored in the vector storage layer mentioned above along with their correlations.
Search Algorithm - the second most important piece of the Correlation Engine allows user to specify a particular finding, & then converts it to a k-dimensional vector & then uses it to query its neighboring findings which are actually its correlated findings within a time window.
Join Engine - the Join engine determines immediate neighbors of a particular finding, given the correlation metadata between the Threat Detector that generated the finding & the log categories .

image (3)

CyberAbwehr commented 1 year ago

Please can you add the following to this function.

Comparison of searches with other indexes.

Examples: Compare IP addresses IOC Blacklist Index against (Netflow Index and Firewall Index) Compare file names IOC Blacklist Index against Winlogbeat Index. Compare SHA numbers IOC Blacklist Index against Winlogbeat Index. Compare MISP Index against (Netflow Index and Winlogbeat Index and Auditbeat Index and Suricata Index, etc.)