wazuh / wazuh-indexer

Wazuh indexer, the Wazuh search engine
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
6 stars 16 forks source link

Amazon Security Lake integration - DTD - AWS Lambda #146

Closed AlexRuiz7 closed 2 months ago

AlexRuiz7 commented 5 months ago

Description

Our first approach to transform the data to OCSF and Apache is to use a Lambda function that reads our data from an auxiliary S3 bucket fed by Logstash, and upload it to the final Amazon Security Lake S3 bucket.

We think this approach is the fastest way to complete the integration, although it's the most expensive in terms of resources.

Functional requirements

Implement a Lambda function that:

Implementation restrictions

AlexRuiz7 commented 5 months ago

Here's a very detailed tutorial about how to create and configure a Lambda function that reads objects from an S3 bucket, processes and stores them in another S3 bucket.

AlexRuiz7 commented 2 months ago

I'm currently doing progress on the implementation of the lambda function using our local Docker environment.

Access for a real AWS deployment has been requested in https://github.com/wazuh/internal-devel-requests/issues/1043

AlexRuiz7 commented 2 months ago

Reading https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html

AlexRuiz7 commented 2 months ago

Local Lambda invocation has been automated through a script.

bash amazon-security-lake/src/invoke-lambda.sh ::file::
AlexRuiz7 commented 2 months ago

The deployment zip package is larger than the 50 MB limit. We need to either upload the zip to an S3 bucket or split it into layers.

AlexRuiz7 commented 2 months ago

Uploaded to the aux S3 bucket

image

Once uploaded, load the zip into the Lambda by clicking on Upload from < Amazon S3 location.

It didn't work either. We'll try to reduce the zip size by removing unneeded libraries.

image

AlexRuiz7 commented 2 months ago

By removing boto3 and parquet-tools from the requirements.txt. boto3 is included already https://gist.github.com/gene1wood/4a052f39490fae00e0c3#file-all_aws_lambda_modules_python3-9-txt

zip size is down to 66 MB.

The zip file is still too big to be uploaded directly, but can be uploaded to an S3 bucket and upload it from there. We can dive into making it even lighter by using layers.

AlexRuiz7 commented 2 months ago

Added code to validate whether the destination S3 bucket name is set. Program exits if not, with appropriate logging.

[ERROR] 2024-04-18T15:38:10.063Z        50ab3aaf-77e5-4286-94d2-6506818ee9ad    Destination bucket not set. Please, set the AWS_BUCKET environment variable with the name of the Amazon Security Lake dedicated S3 bucket.
18 Apr 2024 15:38:10,063
{
  "success": false
}
AlexRuiz7 commented 2 months ago

After many tries. I managed to get it working on AWS.

Here's the output when the variable is not set: image

And this one is when the execution succeeds: image

AlexRuiz7 commented 2 months ago

The parquet file is written to the root of the S3 bucket. According to the Best Practices, objects should be partitioned by source location, AWS Region, AWS account, and date.

bucket-name/source-location/region=region/accountId=accountID/eventDay=YYYYMMDD

In order to do that, we'll need to add these environment variables

Using PingOne's integration for reference:

https://github.com/pingone-davinci/pingone-amazon-security-lake

AlexRuiz7 commented 2 months ago

Parquet files are now uploaded to the correct path.

image

image

Note: execution environment was edited to use 512 MB and 30 seconds timeout.

AlexRuiz7 commented 2 months ago

AWS Lambda requirements