zmoog / public-notes

Apache License 2.0
0 stars 1 forks source link

Figure out how to ingest log files from multiple S3 buckets (CloudFormation install method) #65

Closed zmoog closed 6 months ago

zmoog commented 6 months ago

I want to install ESF to ingest files from two S3 buckets using the sqs-s3 input.

I will use the the CloudFormation install method.

zmoog commented 6 months ago

Requirements

We need to create the following resources:

  1. Two S3 buckets (with the data to ingest)
  2. One SQS queue (for receiving the S3 object creation notifications)
  3. One S3 bucket (to store the config.yaml file)

Overview

image

(1) S3 buckets for the data

You probably already have an S3 bucket with actual data. For this research, I will create two buckets for testing with sample data.

$ aws s3api create-bucket \
    --bucket esf-logs-bucket-001 \
    --region eu-west-1 \
    --create-bucket-configuration LocationConstraint=eu-west-1
{
    "Location": "http://esf-logs-bucket-001.s3.amazonaws.com/"
}

$ aws s3api create-bucket \
    --bucket esf-logs-bucket-002 \
    --region eu-west-1 \
    --create-bucket-configuration LocationConstraint=eu-west-1
{
    "Location": "http://esf-logs-bucket-002.s3.amazonaws.com/"
}

(2) SQS queue

We need an SQS queue where we will send the S3 object creation notifications for the esf-logs-bucket-001 bucket.

Create a new SQS queue named esf-logs-notifications-queue and set the visibility timeout to 910 seconds.

$ cat create-queue.json
{
  "VisibilityTimeout": "910"
}

$ aws sqs create-queue --queue-name esf-logs-notifications-queue --attributes file://create-queue.json
{
    "QueueUrl": "https://sqs.eu-west-1.amazonaws.com/418425532336/esf-logs-notifications-queue"
}

Allow S3 to send object creation notifications from esf-logs-bucket-001 and esf-logs-bucket-002 to esf-logs-notifications-queue:

$ cat policy.json
{
  "Version": "2008-10-17",
  "Id": "__default_policy_ID",
  "Statement": [
    {
      "Sid": "__owner_statement",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::418425532336:root"
      },
      "Action": "SQS:*",
      "Resource": "arn:aws:sqs:eu-west-1:123:esf-logs-notifications-queue"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "s3.amazonaws.com"
      },
      "Action": "SQS:SendMessage",
      "Resource": "arn:aws:sqs:eu-west-1:123:esf-logs-notifications-queue"
    }
  ]
}

# Set the SQS access policy
- Visit Amazon SQS > Queues > esf-logs-notifications-queue > Access policy > Access policy (Permissions) 
- Edit, paste the content of `policy.json`, and save.

# Enable notifications
$ cat notifications.json
{
    "QueueConfigurations": [
        {
            "Id": "Creations",
            "QueueArn": "arn:aws:sqs:eu-west-1:123:zmoog-esf-howto-notifications",
            "Events": [
                "s3:ObjectCreated:*"
            ],
            "Filter": {
                "Key": {
                    "FilterRules": [
                        {
                            "Name": "Prefix",
                            "Value": ""
                        },
                        {
                            "Name": "Suffix",
                            "Value": ""
                        }
                    ]
                }
            }
        }
    ]
}

aws s3api put-bucket-notification-configuration \
    --bucket esf-logs-bucket-001 \
    --notification-configuration file://notification.json

aws s3api put-bucket-notification-configuration \
    --bucket esf-logs-bucket-002 \
    --notification-configuration file://notification.json

Create a sample.log file and upload to the zmoog-esf-howto-data bucket:

$ cat sample.log
Sample log line 1
Sample log line 2

$ aws s3 cp sample.log s3://zmoog-esf-howto-data/
upload: ./sample.log to s3://zmoog-esf-howto-data/sample.log

(3) S3 bucket for config.yml

$ aws s3api create-bucket \
    --bucket esf-logs-bucket-configs \
    --region eu-west-1 \
    --create-bucket-configuration LocationConstraint=eu-west-1
{
    "Location": "http://esf-logs-bucket-configs.s3.amazonaws.com/"
}

And upload a basic configuration file like this:

# config.yml
inputs:
  - type: "s3-sqs"
    id: "arn:aws:sqs:eu-west-1:123:esf-logs-notifications-queue"
    outputs:
      - type: "elasticsearch"
        args:
          # either elasticsearch_url or cloud_id, elasticsearch_url takes precedence
          elasticsearch_url: "<REDACTED>"
          # either api_key or username/password, api_key takes precedence
          api_key: “<REDACTED>"
          es_datastream_name: "logs-generic-default"
          batch_max_actions: 500
          batch_max_bytes: 10485760
          ssl_assert_fingerprint: ""
aws s3 cp config.yml s3://esf-logs-bucket-configs/config.yml
zmoog commented 6 months ago

Deploy ESF

Using https://www.elastic.co/guide/en/esf/master/aws-deploy-elastic-serverless-forwarder.html#aws-serverless-forwarder-deploy-cloudformation

List ESF versions available for deployments:

aws serverlessrepo list-application-versions \
  --application-id arn:aws:serverlessrepo:eu-central-1:267093732750:applications/elastic-serverless-forwarder
cat sar-application.yaml
# sar-application.yaml
Transform: AWS::Serverless-2016-10-31
Resources:
  SarCloudformationDeployment:
    Type: AWS::Serverless::Application
    Properties:
      Location:
        ApplicationId: 'arn:aws:serverlessrepo:eu-central-1:267093732750:applications/elastic-serverless-forwarder'
        SemanticVersion: '1.9.0'  ## SET TO CORRECT SEMANTIC VERSION (MUST BE GREATER THAN 1.6.0)
      Parameters:
        ElasticServerlessForwarderS3ConfigFile: "s3://esf-logs-bucket-configs/config.yml"
        ElasticServerlessForwarderSSMSecrets: ""
        ElasticServerlessForwarderKMSKeys: ""
        ElasticServerlessForwarderSQSEvents: ""
        ElasticServerlessForwarderS3SQSEvents: "arn:aws:sqs:eu-west-1:123:esf-logs-notifications-queue"
        ElasticServerlessForwarderKinesisEvents: ""
        ElasticServerlessForwarderCloudWatchLogsEvents: ""
        ElasticServerlessForwarderS3Buckets: "arn:aws:s3:::esf-logs-bucket-001,arn:aws:s3::esf-logs-bucket-002"
        ElasticServerlessForwarderSecurityGroups: ""
        ElasticServerlessForwarderSubnets: ""

Deploy ESF using the given configuration at sar-application.yaml:

$ aws cloudformation deploy \
    --template-file sar-application.yaml \
    --stack-name esf-logs \
    --capabilities CAPABILITY_IAM CAPABILITY_AUTO_EXPAND

Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - esf-logs

Test

Create a new object in the S3 bucket:

$ cat sample.log
Sample log line 1
Sample log line 2

aws s3 cp sample.log s3://esf-logs-bucket-001/sample.1.log

aws s3 cp sample.log s3://esf-logs-bucket-002/sample.2.log

And then check if the two log lines landed in the data stream logs-generic-default.

You can use the following filter:

data_stream.dataset : "generic"

CleanShot 2023-12-13 at 18 41 43@2x

zmoog commented 6 months ago

Add a new S3 bucket

Overview

Here is the goal for this step:

image

S3 buckets for the data

Create another S3 bucket with logs:

$ aws s3api create-bucket \
    --bucket esf-logs-bucket-003 \
    --region eu-west-1 \
    --create-bucket-configuration LocationConstraint=eu-west-1
{
    "Location": "http://esf-logs-bucket-003.s3.amazonaws.com/"
}

Enable notifications from esf-logs-bucket-003 to esf-logs-notifications-queue:

# Enable notifications
$ cat notifications.json
# same as before

aws s3api put-bucket-notification-configuration \
    --bucket esf-logs-bucket-003 \
    --notification-configuration file://notification.json

Update ESF config and parameters

We need to update ElasticServerlessForwarderS3Buckets adding the new S3 bucket ARN arn:aws:s3::esf-logs-bucket-003:

# sar-application.yaml
Transform: AWS::Serverless-2016-10-31
Resources:
  SarCloudformationDeployment:
    Type: AWS::Serverless::Application
    Properties:
      Location:
        ApplicationId: 'arn:aws:serverlessrepo:eu-central-1:267093732750:applications/elastic-serverless-forwarder'
        SemanticVersion: '1.9.0'  ## SET TO CORRECT SEMANTIC VERSION (MUST BE GREATER THAN 1.6.0)
      Parameters:
        ElasticServerlessForwarderS3ConfigFile: "s3://esf-logs-bucket-configs/config.yml"
        ElasticServerlessForwarderSSMSecrets: ""
        ElasticServerlessForwarderKMSKeys: ""
        ElasticServerlessForwarderSQSEvents: ""
        ElasticServerlessForwarderS3SQSEvents: "arn:aws:sqs:eu-west-1:123:esf-logs-notifications-queue"
        ElasticServerlessForwarderKinesisEvents: ""
        ElasticServerlessForwarderCloudWatchLogsEvents: ""
        ElasticServerlessForwarderS3Buckets: "arn:aws:s3:::esf-logs-bucket-001,arn:aws:s3::esf-logs-bucket-002,arn:aws:s3::esf-logs-bucket-003"
        ElasticServerlessForwarderSecurityGroups: ""
        ElasticServerlessForwarderSubnets: ""

Re-deploy ESF

With a re-deploy, ESF will update the permissions to allow the lambda function to access the arn:aws:s3:::esf-logs-bucket-003 bucket:

$ aws cloudformation deploy \
    --template-file sar-application.yaml \
    --stack-name esf-logs \
    --capabilities CAPABILITY_IAM CAPABILITY_AUTO_EXPAND
Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - esf-logs

Test

As a final test, we will copy the same log file again, but this time to the esf-logs-bucket-003 bucket:

aws s3 cp sample.log s3://esf-logs-bucket-003/sample.3.log

And here is the final result:

CleanShot 2023-12-13 at 18 48 51@2x

zmoog commented 6 months ago

Works as expected.

I can continue this scenario, if needed. Please let me know.