Closed getsaurabh02 closed 1 year ago
First I have to say this sounds great, I think SIEM / analytics would be a great and necessary add to the platform.
Since I probably don't understand the precise details of the intended implementation, I'm going to write these as "questions", but they're purely rhetorical - just meant to illustrate the things I'd value in this.
How extensible is the ingestion component? That is, will a user be able to create their own "Field aliases" to render their logs into the common schema? (Is the common schema already defined?) Is the ingestion mechanism(s) new code, or will it be based on logstash / plugins / filters? One difficulty I have now with a SIEM implementation is the lack of data sources with which to do correlation across different log source types because the number of available feeds is lacking. as compared to the actual number of different platforms that could and should report. Can I assume the "system log" is "syslog" and so can have different collectors from different log sources and can enrich/transform these inputs?
Being able to ingest/incorporate vulnerability scan reports like nmap and Qualys may be useful - being able to search for/ be alerted to vulnerabilities or platforms/versions detected by these scanners would be useful, as well as the ability to perform visualizations including time-based
Similarly, threat intelligence, public blocklists, and CVE data feeds are also useful for alerting and event correlation.
When it comes to findings/alerts/offenses - will there be any kind of workflow? For example, a SOC analyst who receives a finding/offense must generally dispatch the event record in some manner - create a case, link related events, make notes, link it to a JIRA ticket, escalate it to another analyst, close it with audit trail/attestation - will there be an ability to do any of this?
Anomaly detection is always a challenge given the sheer volume of data generated - having correlation, UBA and anomaly detection performed by a dedicated Machine Learning tier as a longer-term capability would be desirable.
And, as always, thanks to all who contribute their efforts to making this a great product!
How extensible is the ingestion component? That is, will a user be able to create their own "Field aliases" to render their logs into the common schema? Is the ingestion mechanism(s) new code, or will it be based on logstash / plugins / filters?
Yes the ingestion layer will be based on the current model and existing clients could be used to support the ingestion of data. However as you rightly pointed, the "Field aliases" will allow rendering of the log data to the common schema format during the read time, when the actual rules are being executed. We are still in the process of evaluating the common schema format to use, however, we will just chose to pick an existing one which can support the log types we are intending to support.
Can I assume the "system log" is "syslog" and so can have different collectors from different log sources and can enrich/transform these inputs?
Yes these are syslog and you can have different collectors sending in the data where the required transformation happens only during the read time via "Field Aliases". We definitely also want to reach to a state where we can support ingest time enrichment and transformation, but not something we are looking as part of the P0. Enrichment shall be a follow up as P1.
Being able to ingest/incorporate vulnerability scan reports like nmap and Qualys may be useful. Similarly, threat intelligence, public blocklists, and CVE data feeds are also useful for alerting and event correlation.
Thanks for the input. Yes the vulnerability scan reports like nmap will be definitely useful and we will look into the possibility of covering it as well . Also, we do have a plan for providing connectors for Threat Intelligence feed to our customers, which can provide real time alerting and added correlations value. However, this is something we have kept out of the scope for P0 currently and will like to take it up as a quick follow. Hence, it is covered under P1 items above.
When it comes to findings/alerts/offenses - will there be any kind of workflow?
The workflow we are visualizing currently will be very similar to the Alerts Workflow we have today in our Alerting Plugin, which provides ability to manage alert states such as severity and acknowledgement. Also, we will support notification to external destinations, such as SNS, which can then be integrated with external systems and applications. Let me know if you think this should be sufficient to address the use cases mentioned above.
We propose to take an iterative approach to address this problem space,
I like that we are taking an iterative approach. Can we break these into individual issues in the meta issue? I think some a full features. https://github.com/opensearch-project/security-analytics/issues/7
MVP will not support for the Transformation layer as covered previously as part of the P1. Modifying schema of data during ingest time could be intrusive to the current workload for existing OpenSearch customers, and might require significant change in the client side setup, or it will force them to establish a completely new parallel setup to try out the security analytics offering. Hence we want to support in-form ingestion of supported log group, retaining the original field names and types. This throws a challenge of effectively running the threat detection rules against different data sources. We will rely on the Field Aliases to provide the read-time mapping for the fields.
Does this mean the common schema won't be available or required?
@brian-grabau and @verbecee I'd be curious to get your feedback on this as I know you all have a lot of experience here.
In addition to rules and altering, I think a key capability is enrichment of events with other datasets. GeoIP is mentioned, but I think it's far wider than that to enable an event to effectively triaged - for example enriching IP addresses with corporate inventory details, public datasets such as WHOIS, known scanners, proxies etc.
It would be great if incoming events could be enriched against data held in another index - this would be useful outside of security analytics too!
Ya I think that sums it up,
And yes I agree enrichment is GeoIP, MISP, User info, Inventory, Cloud information, Port look ups, MITRE look ups and technology/event ID look up for miter attack framework etc..
Started work on some of this already need to automate some more but.. https://github.com/Cargill/OpenSIEM-Logstash-Parsing/tree/1.0/doc/enrichments
Regarding enrichment we have deployed a MISP connector with Memcached on opensearch. We add custom field if there is a match (on IP, domain name, email ...). (https://github.com/bubo-cybersec - sorry no readme/doc yet). If we can help on this part feel free to ask. We had also done request on virustotal to get reputation of IP (but limited to the top 5 each 15 min due to the free API limitations)
Regarding enrichment we have deployed a MISP connector with Memcached on opensearch. We add custom field if there is a match (on IP, domain name, email ...). (https://github.com/bubo-cybersec - sorry no readme/doc yet). If we can help on this part feel free to ask. We had also done request on virustotal to get reputation of IP (but limited to the top 5 each 15 min due to the free API limitations)
Would love to get your input and contributions. Can you elaborate on your proposal. We can discuss more about how to contribute and where would it be most appropriate
Released the feature as experimental in 2.4.0, removing the 2.4 label and keeping the RFC thread open for continued discussions.
Introduction
Today, all businesses - large and small, across industries and geographies are vulnerable to security threats to their business data. To detect and respond to these threats, many businesses use various commercial security incident management solutions such as a Security Information and Event Management (SIEM) solution to gather security related information, identify potential security issues, manage incident response, and generate reports for compliance.
While many users already have existing solutions such as OpenSearch cluster setup to collect and analyze their security and log data, they are looking to save costs of cloning the same data set to a separate SIEM solution that has advanced capabilities. While the SecOps teams can use the powerful capabilities of OpenSearch to collect, search and analyze log and security data from across their applications, infrastructure and security defense tools to identify potential threats, it is time taking and involves a significant amount of manual effort from security experts.
Problem Statement
The current OpenSearch customers have no easy way to run security related rules on their data, out of the box, without having to clone their data into a dedicated security specific solution. Over time these OpenSearch customers pay a high cost to deploy and operate off the shelf security solutions, due to the data volume-based pricing plans from most vendors. Due to increasing volumes of data related to enterprise security, customers are facing increases in the costs of software licenses and infrastructure required for data processing. Increasing costs are harder to justify for many customers who are using only a subset of the product features.
Also, often times these customers need custom features for which they aren’t able to easily modify or adapt the existing products to their use case. This especially include tailoring custom rules which are more suited to their workload and data. Since missing features reflect the needs of the business in a specific industry, the vendor is driven by the size of that market and demand for that particular feature, sometimes resulting in a long wait for customers. In addition, customers need to employ significantly large teams of hard to find security experts who need to be trained to use multiple security products.
The longitivtiy of log data retention and findings is another factor which highly affects the cost and ability to generate extended insights for customers. They need multi-tiered solution for longer data archival in external store, with capabilities to externally search and restore on-demand, to find deeper patterns by correlating previous incidents.
Proposed Solution
Summary
By offering OpenSearch customers Security Analytics solution in form of an extensible plugin, with pre-built functionality, will address the above problems. It will enable customers to use their existing OpenSearch cluster for detecting security threats in their business data. It will also allow them to modify/tailor the default solution and develop the necessary proprietary changes on top, while using the low level components and uniform interfaces offered by the plugin. Customers will not have to pay hefty licensing fees that grow with their data volumes, while also using the familiar OpenSearch software. As an open source platform with potential contributions from a broader community, this would enable customers to move fast and build their custom solutions quicker.
In the Security Analytics plugin, OpenSearch will add several advanced tools and capabilities for the security use-cases including IP reputation, provide in-built rules for the most common type of attacks, integration points with external threat intel feeds, threat correlation techniques for deeper insights, external storage archival. This will enable security analysts to go beyond the ad-hoc analysis and find potential threats in real time with minimal effort as well as expedite forensic analysis of incidents.
We propose to take an iterative approach to address this problem space, by first providing our customers with an initial version of plugin P1 of Security Analytics in OpenSearch, which will bring in a delightful experience while addressing the basic security use-cases for business data. The initial offering will adhere few basic tenets - such as simplified onboarding, horizontal scalability, extensibility of framework and optimal performance. From functional viewpoint, it will simplify the security operations for the OpenSearch customers, allowing them to build automation for their own security workflows with OpenSearch, irrespective of being tied up to any specific vendor they might be currently onboarded to.
In order to simplify for adoption further and get faster feedbacks from our customers, we propose to scope down further, and come up with the MVP of Security Analytics first, as a pre-requisite to P1.
P1 of Security Analytics in OpenSearch - Primary Goals
The solution will be able to transform raw log messages into the common schema format, while applying additional enrichment - such as GeoIP resolution, match against ~2000+ open source out of the box threat detection rules - such as Sigma, store the findings in form of an index in OpenSearch cluster for query, provide visualization and correlation experience for customers to expedite threat analysis.
It will provide below functional capabilities:
Building blocks
To address the above capabilities, the P1 of Security Analytics will primarily need to support below low-level component layers. Each of these will independently operate and interact based on generic interfaces to provide capabilities for users to plug in a tailored version of the solution, based on their proprietary need.
Sequence Diagram
This covers the basic flow of data across different low-level components (layers) of the solution:
The MVP (P0) of Security Analytics in OpenSearch
While all of the above component layers covered as part of P1 of Security Analytics are important for complete end-to-end experience, for easier adoption and faster feedback from our customers, we propose to phase out few components, and not include them as essential for the MVP of Security Analytics. This will ensure the existing OpenSearch customers do not have any disruptive setup requirement needs when trying out the feature for the first time, and hence simplifying their onboarding experience.
The MVP of Security Analytics will help customers answer following two primary questions from the security perspective:
Some of the primary tenets that MVP will follow:
Log Sources
Among the wide variety of log data sets that customer might be generating in the IT infrastructure, the MVP of Security Analytics will support the below log sources types:
Building Blocks
The ability to generate Alerts on the newly ingested documents among the pre-configured indices will act as the primary building block for Security Analytics MVP offering, where we want to harness the rule execution capabilities in form of document level monitors offered by the OpenSearch Alerting Plugin. The MVP will piggyback on the monitors ability to execute filters and percolate queries to arrive at the findings which are meaningful from threat detection perspective. The scale of query execution and parallelism needed to execute these queries over large set of incoming documents will be provided as feature additions to the current approach for document based alerting being proposed by the Alerting Plugin.
MVP will not support for the Transformation layer as covered previously as part of the P1. Modifying schema of data during ingest time could be intrusive to the current workload for existing OpenSearch customers, and might require significant change in the client side setup, or it will force them to establish a completely new parallel setup to try out the security analytics offering. Hence we want to support in-form ingestion of supported log group, retaining the original field names and types. This throws a challenge of effectively running the threat detection rules against different data sources. We will rely on the Field Aliases to provide the read-time mapping for the fields.
Use Cases for MVP (P0) of Security Analytics in OpenSearch
The MVP Product for Security Analytics via OpenSearch will provide visibility into various log sources to detect, investigate, and respond to evolving threats. Security Analytics will enable analysis of network-related, application-related and host-related security events as part of alert investigations and interactive threat hunting including basic correlations.
Threat Detection Threat detection is a practice of analyzing the entirety of log sources against the security vulnerabilities to identify any malicious activity that could compromise the network or system. If a threat is detected, then mitigation efforts must be enacted to properly neutralize the threat before it can exploit any present vulnerabilities.
Threat Hunting The security analytics solution will provide a single interface for the security analysts to identify threats in real time and take action, perform forensic analysis on historical data, protect PII data and generate risk reports for auditing or compliance purposes. The solution will essentially combine and automate the process of detecting threats, giving the analyst the controls to investigate further.
Monitoring and Reporting Monitoring and detecting related security events across the plethora of systems, devices or applications logs can become a challenging problem. Security Analytics offering via OpenSearch which serves as a common platform for threat analysis, incident reporting and monitoring ongoing activities.
Analyst and Operator Collaboration Our security analytics solution should provide operators mechanism to execute RunBooks or Automated SoPs curated ahead by the analyst to respond to reported incidents, for creating cases or run correlations across various log sources. These run-books should be available with relevant tags for ease of searching and usage.
Correlation across log sources : Event correlation is an essential part of any security analytics solution. It aggregates and analyzes log data to discover security threats and malicious patterns of behaviors that otherwise go unnoticed and can lead to compromise or data loss. Writing correlation rules is time consuming and requires a deep understanding of how attackers operate. The solution for Security Analytics should provide ability:
Plug-ability and Extensibility With pluggable and uniform interfaces offered by the plugins, vendors/developers can also build custom solutions on top of this framework to further create value for their customers. By utilizing the power of the low level artifacts offered by the framework, customers will also benefit from utilizing the tailored components contributed by other community members.
Feedback Requested