Closed sbcd90 closed 11 months ago
Please can you add the following to this function.
Comparison of searches with other indexes.
Examples: Compare IP addresses IOC Blacklist Index against (Netflow Index and Firewall Index) Compare file names IOC Blacklist Index against Winlogbeat Index. Compare SHA numbers IOC Blacklist Index against Winlogbeat Index. Compare MISP Index against (Netflow Index and Winlogbeat Index and Auditbeat Index and Suricata Index, etc.)
As we recently discussed - lets move forward with adding the correlation metadata Knowledge into the field mapping API
Moving the conversation to core RFC
Work was completed
Introduction
Security Analytics is an open-source solution for security operations in OpenSearch. Security Analytics’ threat detection engine converts the detection rules into executable OpenSearch queries which are then matched against the logs or events ingested by the user to generate
findings
. The trigger condition filters are further applied on the findings to generatealerts
.Problem Statement
Today in Security Analytics, the generated
findings
&alerts
belong toindividual log types & there is no way to automatically correlate between them
. As customers data spans across multiple security event logs (s3 access, VPC flow, sys log, DNS ), a finding on just one log source is not enough to increase the confidence of the finding and moreover a strong correlation across logs helps customers to dive into the relationship of data across different sources. In order to understand this correlation across findings from different log sources, customers would manually need to browse through the list of findings generated for individual log categories & thenneed to identify the correlated patterns manually
.Here is an example.
Example Infrastructure
In this sample customer infrastructure diagram shown below, the customer has a
Django REST application
which is hosted on aEC2 Windows instance
. TheREST apis
useActive Directory
as identity Provider & theDjango Application
usesS3
to store & query the files. Theincoming network traffic
logs toEC2 Windows instance
are also stored asVPC Flow Logs
.Security Analytics detectors generate findings for a threat
In order for
Security Analytics
to monitor & detect threats for the abovecustomer infrastructure
, we need to define aSecurity Analytics Threat Detector
for each component in the infrastructure. For example, the below diagram shows thatNetwork Detector
is defined forVPC Flow logs
,Windows Detector
is defined forEC2 Windows instance
& so on.Now, lets try to simulate a
security attack
on this infrastructure. In this attack, the attacker usessbcd90
user to call a REST api namedPOST /customer_records.txt
which tries toreplicate a sensitive file named customer_records.txt
fromS3
.In the above diagram, we show that if such an attack happens, each
Threat Detector
which is monitoring its corresponding infrastructure component generates afinding
. For example, theAD/Ldap Detector
generates afinding
that anInvalid Username/Password, ResultType: 50126
finding is generated & so on.These
findings
as shown in the diagram are generated byindividual detectors
& they belong to theirrespective log types
. But, how does the customer know that theAd/Ldap Detector
finding ofInvalid Username/Password, ResultType: 50126
is related to a chain ofsecurity events
occurring around thesame time window
on the infrastructure(say the403 Forbidden error
finding fromApplications detector
)?One way is to
manually correlate this finding with list of findings belonging to other log types
within aparticular time range
. Can we possibly solve this problem automatically?Proposed Solution
The
Security Analytics Correlation Engine
provides an approach to solve this issue by allowing the customers todefine different threat scenarios that can be identified from the logs generated from the individual systems in their infrastructure exactly once
& thengenerating correlations between findings from different log categories automatically
.Correlation Engine
is aSecurity Finding Knowledge Graph
which can be used to store connected findings data & generatecorrelated insights(as well as correlated historical insights)
based ontime windows
from them .Correlation Engine Feature Scope
Customer can define the most relevant
threat scenarios
between logs of different systems in their infrastructure ascorrelation rules
using simplesql-like
queries. Here is an example.If we want to define a
threat scenario
that can identify403 Forbidden error
findings generated byapplication detector
on a set ofwindows hosts
with ip range4.5.6.*
, we can define it as follows:Thus, this
threat scenario
connectsapplication logs
withwindows logs
in a particular scenario. Similarly, customer can define severalthreat scenarios
for different systems in their infrastructurebased on their requirements
.These
threat scenarios
are then used by theCorrelation Engine
to define agraph of correlated findings
.Correlation Engine
then can generate nearby findings to a particular finding, thus correlatingfindings, logs & rules across log categories
. Here is an examplecorrelation
generated forthe example infrastructure described above in the diagram
.The
scores
determine theproximity
of eachrelevant(identified from threat scenarios defined by customer)
finding from thewindows
finding in query05e75ff0-4ae9-44bd-805f-893559e9fa62
within the time window of2 minutes
.Building Blocks
The
Detectors
inSecurity Analytics
internally createsMonitors
inAlerting
which runsperiodic jobs
against the infrastructure logs generated from each component in the customer infrastructure. When these logs match the rules,findings
are generated inAlerting
.Once a
finding
is generated, anasynchronous(fire & forget)
transport layer call is made to theCorrelation Engine
to correlate thisnew finding
withexisting findings
. Thisnew finding & its correlations
are then stored in theHNSW Graph(or Vector storage)
.Correlation Engine internals
The
internals
of theCorrelation Engine
is composed of4 major components
.vector
level.Correlation Engine
is its insertion algorithm. In this layer,findings
are converted tok-dimensional vectors
& are stored in thevector storage
layer mentioned above along with theircorrelations
.Correlation Engine
allows user to specify a particular finding, & then converts it to ak-dimensional vector
& then uses it to query itsneighboring findings
which are actually itscorrelated findings
within atime window
.immediate neighbors
of a particular finding,given the correlation metadata between the Threat Detector that generated the finding & the log categories
.