[RFC] Security Analytics initial offering in OpenSearch

getsaurabh02 commented 2 years ago

Introduction

Today, all businesses - large and small, across industries and geographies are vulnerable to security threats to their business data. To detect and respond to these threats, many businesses use various commercial security incident management solutions such as a Security Information and Event Management (SIEM) solution to gather security related information, identify potential security issues, manage incident response, and generate reports for compliance.

While many users already have existing solutions such as OpenSearch cluster setup to collect and analyze their security and log data, they are looking to save costs of cloning the same data set to a separate SIEM solution that has advanced capabilities. While the SecOps teams can use the powerful capabilities of OpenSearch to collect, search and analyze log and security data from across their applications, infrastructure and security defense tools to identify potential threats, it is time taking and involves a significant amount of manual effort from security experts.

Problem Statement

The current OpenSearch customers have no easy way to run security related rules on their data, out of the box, without having to clone their data into a dedicated security specific solution. Over time these OpenSearch customers pay a high cost to deploy and operate off the shelf security solutions, due to the data volume-based pricing plans from most vendors. Due to increasing volumes of data related to enterprise security, customers are facing increases in the costs of software licenses and infrastructure required for data processing. Increasing costs are harder to justify for many customers who are using only a subset of the product features.

Also, often times these customers need custom features for which they aren’t able to easily modify or adapt the existing products to their use case. This especially include tailoring custom rules which are more suited to their workload and data. Since missing features reflect the needs of the business in a specific industry, the vendor is driven by the size of that market and demand for that particular feature, sometimes resulting in a long wait for customers. In addition, customers need to employ significantly large teams of hard to find security experts who need to be trained to use multiple security products.

The longitivtiy of log data retention and findings is another factor which highly affects the cost and ability to generate extended insights for customers. They need multi-tiered solution for longer data archival in external store, with capabilities to externally search and restore on-demand, to find deeper patterns by correlating previous incidents.

Proposed Solution

Summary

By offering OpenSearch customers Security Analytics solution in form of an extensible plugin, with pre-built functionality, will address the above problems. It will enable customers to use their existing OpenSearch cluster for detecting security threats in their business data. It will also allow them to modify/tailor the default solution and develop the necessary proprietary changes on top, while using the low level components and uniform interfaces offered by the plugin. Customers will not have to pay hefty licensing fees that grow with their data volumes, while also using the familiar OpenSearch software. As an open source platform with potential contributions from a broader community, this would enable customers to move fast and build their custom solutions quicker.

In the Security Analytics plugin, OpenSearch will add several advanced tools and capabilities for the security use-cases including IP reputation, provide in-built rules for the most common type of attacks, integration points with external threat intel feeds, threat correlation techniques for deeper insights, external storage archival. This will enable security analysts to go beyond the ad-hoc analysis and find potential threats in real time with minimal effort as well as expedite forensic analysis of incidents.

We propose to take an iterative approach to address this problem space, by first providing our customers with an initial version of plugin P1 of Security Analytics in OpenSearch, which will bring in a delightful experience while addressing the basic security use-cases for business data. The initial offering will adhere few basic tenets - such as simplified onboarding, horizontal scalability, extensibility of framework and optimal performance. From functional viewpoint, it will simplify the security operations for the OpenSearch customers, allowing them to build automation for their own security workflows with OpenSearch, irrespective of being tied up to any specific vendor they might be currently onboarded to.

In order to simplify for adoption further and get faster feedbacks from our customers, we propose to scope down further, and come up with the MVP of Security Analytics first, as a pre-requisite to P1.

P1 of Security Analytics in OpenSearch - Primary Goals

The solution will be able to transform raw log messages into the common schema format, while applying additional enrichment - such as GeoIP resolution, match against ~2000+ open source out of the box threat detection rules - such as Sigma, store the findings in form of an index in OpenSearch cluster for query, provide visualization and correlation experience for customers to expedite threat analysis.

It will provide below functional capabilities:

Log data collection and transformation for selected different log sources.
Provide store for pre-populated threat detection rules from the open source world.
Support to add custom rules or modify existing ones for customer’s proprietary use-cases.
Support (to SOC Analyst) for ingesting or integrating with external Threat Intelligence Feeds.
Threat detection alerts based on preconfigured thresholds to multiple destination channels.
Efficient retention of data in form of external archival store.
Search function for Logs and Findings to perform expedited threat hunting.
Reporting and notification for visualization of events and patterns.
Data connectors/adapters to source data from existing SIEM solutions to OpenSearch cluster.
On Demand Event Correlation against other log sources and archived data sources
Extensible models to allow community to build tailored/custom solutions on top of this framework.

Building blocks

To address the above capabilities, the P1 of Security Analytics will primarily need to support below low-level component layers. Each of these will independently operate and interact based on generic interfaces to provide capabilities for users to plug in a tailored version of the solution, based on their proprietary need.

Streaming Layer : This is the first layer in the framework which accepts the incoming request. It can index the request optionally, to ack back the response to customers before any of the further action related to threat detection is actually taken. A separate configurable interface will also provide the ability to perform the threat detection for newly ingested documents to happen in a synchronous manner.
Transformation Layer : This is the layer essential for running the required transformation on the variety of log sources and convert them into a common schema format. The common schema format is important for performant rule execution and event correlations. This layer could even exist external to the security framework, such asin form of another ingest plugin. This layer could even be optional, in case customer wants to retain the original log format, where rule execution can still run using the concept of Field Aliases. This will require an update in form of index mapping for read-time mapping of fields.
Custom Rule Ingestion Layer : In addition to the pre-loaded rules from open source platforms such as Sigma, customers and security engineers will need to have the ability to ingest on-demand custom rules and enrich the threat detection monitors based on their workload. This layer will also provide integration points for ingesting Threat Intelligence Feeds from other external platforms.
Rule Matching Layer : This layer performs the actual execution of rules for threat detection against the newly ingested documents in the pre-configured data sources (OpenSearch indices). The rule matching layer will provide for multi-step and concurrent rule pipeline executions for efficient management of ~2000+ rules. Asynchronous execution of the rule pipeline in form of batches is desirable to achieve higher degree of performance and horizontal scalability.
Event Persistence Layer : Incidents once detected by the Rule Matching layer needs to be captures and persisted in form of findings index for visualization and further inspection. Findings data can be independent, based on per log sources which allows support for real-time threat hunting and on-demand/automated correlation of threat patterns across different log sources.
Event View and Alerting Layer : In order to provide a seamless user experience for our customers, there is a need for strong dash-boarding experience which helps customer visualize and correlate the threats, view generated alerts, create threat cases, refer old cases and perform correlations across different log sources (in different timelines), cross-cluster searches with event based query support.
External Persistence Layer : This is to provide long term archival of logs and findings in form of external store - such as S3. Customers will also be able to on-demand restore indices back into cluster or run slower but meaningful correlations directly on the external stores.
Data Connectors/Adapter Layer : This is to allow out of the box data import experience from few existing SIEM solutions which customers are using today. This will simplify the onboarding experience for customers, to be able to import data into the OpenSearch clusters for security analysis and threat report comparisons.

Sequence Diagram

This covers the basic flow of data across different low-level components (layers) of the solution:

Sequence (1) (1)

The MVP (P0) of Security Analytics in OpenSearch

While all of the above component layers covered as part of P1 of Security Analytics are important for complete end-to-end experience, for easier adoption and faster feedback from our customers, we propose to phase out few components, and not include them as essential for the MVP of Security Analytics. This will ensure the existing OpenSearch customers do not have any disruptive setup requirement needs when trying out the feature for the first time, and hence simplifying their onboarding experience.

The MVP of Security Analytics will help customers answer following two primary questions from the security perspective:

Is there any malicious activity in the system currently? - As an OpenSearch customer I should get an in-time automated alert for vulnerabilities detected in my logs.
Is this something similar that has happened earlier? - As an OpenSearch customer I should be able to run on-demand correlations across existing log sources, previous findings for finding threat patterns.

Some of the primary tenets that MVP will follow:

Cost effectiveness for infrastructure setup
Simplified Onboarding, Deployment and Setup
Horizontally scalable for large data volumes
Plug-ability across the layer for custom tailoring and modifications
Speed of detecting, generating alert and running search on findings

Log Sources

Among the wide variety of log data sets that customer might be generating in the IT infrastructure, the MVP of Security Analytics will support the below log sources types:

Netflow
DNS logs
Apache access logs
Windows logs
Active Directory / LDAP logs
System logs
Cloud Trail logs
S3 access logs

Building Blocks

MVP (2) (1)

The ability to generate Alerts on the newly ingested documents among the pre-configured indices will act as the primary building block for Security Analytics MVP offering, where we want to harness the rule execution capabilities in form of document level monitors offered by the OpenSearch Alerting Plugin. The MVP will piggyback on the monitors ability to execute filters and percolate queries to arrive at the findings which are meaningful from threat detection perspective. The scale of query execution and parallelism needed to execute these queries over large set of incoming documents will be provided as feature additions to the current approach for document based alerting being proposed by the Alerting Plugin.
MVP will not support for the Transformation layer as covered previously as part of the P1. Modifying schema of data during ingest time could be intrusive to the current workload for existing OpenSearch customers, and might require significant change in the client side setup, or it will force them to establish a completely new parallel setup to try out the security analytics offering. Hence we want to support in-form ingestion of supported log group, retaining the original field names and types. This throws a challenge of effectively running the threat detection rules against different data sources. We will rely on the Field Aliases to provide the read-time mapping for the fields.

Use Cases for MVP (P0) of Security Analytics in OpenSearch

The MVP Product for Security Analytics via OpenSearch will provide visibility into various log sources to detect, investigate, and respond to evolving threats. Security Analytics will enable analysis of network-related, application-related and host-related security events as part of alert investigations and interactive threat hunting including basic correlations.

Threat Detection Threat detection is a practice of analyzing the entirety of log sources against the security vulnerabilities to identify any malicious activity that could compromise the network or system. If a threat is detected, then mitigation efforts must be enacted to properly neutralize the threat before it can exploit any present vulnerabilities.

Threat Hunting The security analytics solution will provide a single interface for the security analysts to identify threats in real time and take action, perform forensic analysis on historical data, protect PII data and generate risk reports for auditing or compliance purposes. The solution will essentially combine and automate the process of detecting threats, giving the analyst the controls to investigate further.

Monitoring and Reporting Monitoring and detecting related security events across the plethora of systems, devices or applications logs can become a challenging problem. Security Analytics offering via OpenSearch which serves as a common platform for threat analysis, incident reporting and monitoring ongoing activities.

Analyst and Operator Collaboration Our security analytics solution should provide operators mechanism to execute RunBooks or Automated SoPs curated ahead by the analyst to respond to reported incidents, for creating cases or run correlations across various log sources. These run-books should be available with relevant tags for ease of searching and usage.

Correlation across log sources : Event correlation is an essential part of any security analytics solution. It aggregates and analyzes log data to discover security threats and malicious patterns of behaviors that otherwise go unnoticed and can lead to compromise or data loss. Writing correlation rules is time consuming and requires a deep understanding of how attackers operate. The solution for Security Analytics should provide ability:

To run queries and map data across multiple log sources and generate deeper insights to examine the depth of attack. For example an IP identified under network DoS attack (high request rate) can be run against application logs to identify the kind of object accessed (high_null_records_requests).
To run queries against the older data sets (warm or archived indices), to find past events and create meaningful information out the incidents.

Plug-ability and Extensibility With pluggable and uniform interfaces offered by the plugins, vendors/developers can also build custom solutions on top of this framework to further create value for their customers. By utilizing the power of the low level artifacts offered by the framework, customers will also benefit from utilizing the tailored components contributed by other community members.

Feedback Requested

For MVP, we are planning to support above mentioned 8 Log Sources type. We would like to get feedback from community if this is a a right priority to start with. If there are other sources which holds higher priority.
For MVP, we are planning to import threat detection rules from the open source security project Sigma. Sigma is a generic and open signature format that allows you to describe relevant log events in a straightforward manner. We will like to get community feedback on what else they would like to see.
For MVP, we are not going to perform ingest-time transformation but rather do perform read-time mapping to rule fields via Field Aliases. We will like to get feedback and see if community concurs this is a right approach to begin with.
Are there more use-cases which the community thinks we should incorporate in the scope of MVP and P1 for the solution.

squiddy-gh commented 2 years ago

First I have to say this sounds great, I think SIEM / analytics would be a great and necessary add to the platform.

Since I probably don't understand the precise details of the intended implementation, I'm going to write these as "questions", but they're purely rhetorical - just meant to illustrate the things I'd value in this.

How extensible is the ingestion component? That is, will a user be able to create their own "Field aliases" to render their logs into the common schema? (Is the common schema already defined?) Is the ingestion mechanism(s) new code, or will it be based on logstash / plugins / filters? One difficulty I have now with a SIEM implementation is the lack of data sources with which to do correlation across different log source types because the number of available feeds is lacking. as compared to the actual number of different platforms that could and should report. Can I assume the "system log" is "syslog" and so can have different collectors from different log sources and can enrich/transform these inputs?
Being able to ingest/incorporate vulnerability scan reports like nmap and Qualys may be useful - being able to search for/ be alerted to vulnerabilities or platforms/versions detected by these scanners would be useful, as well as the ability to perform visualizations including time-based
Similarly, threat intelligence, public blocklists, and CVE data feeds are also useful for alerting and event correlation.
When it comes to findings/alerts/offenses - will there be any kind of workflow? For example, a SOC analyst who receives a finding/offense must generally dispatch the event record in some manner - create a case, link related events, make notes, link it to a JIRA ticket, escalate it to another analyst, close it with audit trail/attestation - will there be an ability to do any of this?
Anomaly detection is always a challenge given the sheer volume of data generated - having correlation, UBA and anomaly detection performed by a dedicated Machine Learning tier as a longer-term capability would be desirable.

And, as always, thanks to all who contribute their efforts to making this a great product!

getsaurabh02 commented 2 years ago

How extensible is the ingestion component? That is, will a user be able to create their own "Field aliases" to render their logs into the common schema? Is the ingestion mechanism(s) new code, or will it be based on logstash / plugins / filters?

Yes the ingestion layer will be based on the current model and existing clients could be used to support the ingestion of data. However as you rightly pointed, the "Field aliases" will allow rendering of the log data to the common schema format during the read time, when the actual rules are being executed. We are still in the process of evaluating the common schema format to use, however, we will just chose to pick an existing one which can support the log types we are intending to support.

Can I assume the "system log" is "syslog" and so can have different collectors from different log sources and can enrich/transform these inputs?

Yes these are syslog and you can have different collectors sending in the data where the required transformation happens only during the read time via "Field Aliases". We definitely also want to reach to a state where we can support ingest time enrichment and transformation, but not something we are looking as part of the P0. Enrichment shall be a follow up as P1.

Being able to ingest/incorporate vulnerability scan reports like nmap and Qualys may be useful. Similarly, threat intelligence, public blocklists, and CVE data feeds are also useful for alerting and event correlation.

Thanks for the input. Yes the vulnerability scan reports like nmap will be definitely useful and we will look into the possibility of covering it as well . Also, we do have a plan for providing connectors for Threat Intelligence feed to our customers, which can provide real time alerting and added correlations value. However, this is something we have kept out of the scope for P0 currently and will like to take it up as a quick follow. Hence, it is covered under P1 items above.

When it comes to findings/alerts/offenses - will there be any kind of workflow?

The workflow we are visualizing currently will be very similar to the Alerts Workflow we have today in our Alerting Plugin, which provides ability to manage alert states such as severity and acknowledgement. Also, we will support notification to external destinations, such as SNS, which can then be integrated with external systems and applications. Let me know if you think this should be sufficient to address the use cases mentioned above.

elfisher commented 2 years ago

We propose to take an iterative approach to address this problem space,

I like that we are taking an iterative approach. Can we break these into individual issues in the meta issue? I think some a full features. https://github.com/opensearch-project/security-analytics/issues/7

MVP will not support for the Transformation layer as covered previously as part of the P1. Modifying schema of data during ingest time could be intrusive to the current workload for existing OpenSearch customers, and might require significant change in the client side setup, or it will force them to establish a completely new parallel setup to try out the security analytics offering. Hence we want to support in-form ingestion of supported log group, retaining the original field names and types. This throws a challenge of effectively running the threat detection rules against different data sources. We will rely on the Field Aliases to provide the read-time mapping for the fields.

Does this mean the common schema won't be available or required?

dtaivpp commented 2 years ago

@brian-grabau and @verbecee I'd be curious to get your feedback on this as I know you all have a lot of experience here.

jimmyjones2 commented 2 years ago

In addition to rules and altering, I think a key capability is enrichment of events with other datasets. GeoIP is mentioned, but I think it's far wider than that to enable an event to effectively triaged - for example enriching IP addresses with corporate inventory details, public datasets such as WHOIS, known scanners, proxies etc.

It would be great if incoming events could be enriched against data held in another index - this would be useful outside of security analytics too!

brian-grabau commented 2 years ago

Ya I think that sums it up,

Need to add flow logs which we need to stich (combine many messages into 1 message) and deduplicate (add every router that seen connection to the 1 record)
If we have that then it makes sense to combine all network traffic logs into 1 record.
Then enrich the smaller reduced set of logs
Not all logs are going to come in at same time (i.e. long open connections) and cannot keep large state/inflight deduplication in memory forever so need to have role up on post insertion process.
and maybe a front end feature to combine searches across technologies where we don't want to combine logs. We are beginning work this month on some of this.

And yes I agree enrichment is GeoIP, MISP, User info, Inventory, Cloud information, Port look ups, MITRE look ups and technology/event ID look up for miter attack framework etc..

Started work on some of this already need to automate some more but.. https://github.com/Cargill/OpenSIEM-Logstash-Parsing/tree/1.0/doc/enrichments

hehohein commented 2 years ago

Regarding enrichment we have deployed a MISP connector with Memcached on opensearch. We add custom field if there is a match (on IP, domain name, email ...). (https://github.com/bubo-cybersec - sorry no readme/doc yet). If we can help on this part feel free to ask. We had also done request on virustotal to get reputation of IP (but limited to the top 5 each 15 min due to the free API limitations)

praveensameneni commented 2 years ago

Regarding enrichment we have deployed a MISP connector with Memcached on opensearch. We add custom field if there is a match (on IP, domain name, email ...). (https://github.com/bubo-cybersec - sorry no readme/doc yet). If we can help on this part feel free to ask. We had also done request on virustotal to get reputation of IP (but limited to the top 5 each 15 min due to the free API limitations)

Would love to get your input and contributions. Can you elaborate on your proposal. We can discuss more about how to contribute and where would it be most appropriate

praveensameneni commented 1 year ago

Released the feature as experimental in 2.4.0, removing the 2.4 label and keeping the RFC thread open for continued discussions.

opensearch-project / security-analytics