Closed AlexRuiz7 closed 2 months ago
Here's a very detailed tutorial about how to create and configure a Lambda function that reads objects from an S3 bucket, processes and stores them in another S3 bucket.
I'm currently doing progress on the implementation of the lambda function using our local Docker environment.
Access for a real AWS deployment has been requested in https://github.com/wazuh/internal-devel-requests/issues/1043
Local Lambda invocation has been automated through a script.
bash amazon-security-lake/src/invoke-lambda.sh ::file::
The deployment zip package is larger than the 50 MB limit. We need to either upload the zip to an S3 bucket or split it into layers.
Uploaded to the aux S3 bucket
Once uploaded, load the zip into the Lambda by clicking on Upload from < Amazon S3 location.
It didn't work either. We'll try to reduce the zip size by removing unneeded libraries.
By removing boto3
and parquet-tools
from the requirements.txt
. boto3
is included already https://gist.github.com/gene1wood/4a052f39490fae00e0c3#file-all_aws_lambda_modules_python3-9-txt
zip
size is down to 66 MB.
The zip
file is still too big to be uploaded directly, but can be uploaded to an S3 bucket and upload it from there. We can dive into making it even lighter by using layers.
Added code to validate whether the destination S3 bucket name is set. Program exits if not, with appropriate logging.
[ERROR] 2024-04-18T15:38:10.063Z 50ab3aaf-77e5-4286-94d2-6506818ee9ad Destination bucket not set. Please, set the AWS_BUCKET environment variable with the name of the Amazon Security Lake dedicated S3 bucket.
18 Apr 2024 15:38:10,063
{
"success": false
}
After many tries. I managed to get it working on AWS.
Here's the output when the variable is not set:
And this one is when the execution succeeds:
The parquet file is written to the root of the S3 bucket. According to the Best Practices, objects should be partitioned by source location, AWS Region, AWS account, and date.
bucket-name/source-location/region=region/accountId=accountID/eventDay=YYYYMMDD
In order to do that, we'll need to add these environment variables
SOURCE_LOCATION
ACCOUNT_ID
IAM_ROLE_ARN
: replaces AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
in production.Using PingOne's integration for reference:
https://github.com/pingone-davinci/pingone-amazon-security-lake
Parquet files are now uploaded to the correct path.
Note: execution environment was edited to use 512 MB and 30 seconds timeout.
.txt
AWS_BUCKET
: The name of the Amazon S3 bucket in which Security Lake stores your custom source data.AWS_REGION
: AWS Region to which the data is writtenSOURCE_LOCATION
: The Source Location configured in Security Lake during the Custom Source creation.ACCOUNT_ID
: AWS account ID that the records in the source partition pertain to.IAM_ROLE_ARN
: The AWS Role ARN for the IAM Role with access to write to the Security Lake Custom Source S3 bucket
Description
Our first approach to transform the data to OCSF and Apache is to use a Lambda function that reads our data from an auxiliary S3 bucket fed by Logstash, and upload it to the final Amazon Security Lake S3 bucket.
We think this approach is the fastest way to complete the integration, although it's the most expensive in terms of resources.
Functional requirements
Implement a Lambda function that:
Implementation restrictions