zmoog / public-notes

Apache License 2.0
0 stars 1 forks source link

Figure out how to route logs in the Azure Logs integration #92

Open zmoog opened 2 weeks ago

zmoog commented 2 weeks ago

Situation

The Azure Logs integration allows multiple log categories to be collected from a single event hub.

At a high level, users (1) define the event hub name and settings, and (2) the integration will use the same event hub for all the integrations.

CleanShot 2024-08-28 at 23 10 05@2x

Problem

This setup is inefficient, and we plan to change it in future releases.

Solutions

You can only use the generic integration and route logs to the right data stream using the reroute processor.

image

zmoog commented 2 weeks ago

With one input + routing, we can reduce the user errors metric to zero and make the fewest storage account API calls possible.

Here's the diagram to leverage routing:

image

  1. One input receives log events from the event hub
  2. The input publishes the log events to the logs-azure.eventhub-default data stream.
  3. The logs-azure.eventhub-default data stream contains a logs-azure.eventhub@custom custom pipeline with rules to route log events based on the log category.
  4. Each log event lands in the target data stream.

If the routing rules cover all incoming log categories, the logs-azure.eventhub-default data stream will be empty. However, we can set up an alarm rule to trigger a notification in case any log event doesn't have a routing rule so that we can iterate and update the logs-azure.eventhub@custom custom pipeline.

The routing option is probably the most efficient method.

Here's the source code of the logs-azure.eventhub@custom pipeline I am testing:

PUT _ingest/pipeline/logs-azure.eventhub@custom
{
  "processors": [
    {
      "json": {
        "field": "message",
        "target_field": "tmp_json"
      }
    },
    {
      "set": {
        "field": "routing_category",
        "copy_from": "tmp_json.category",
        "ignore_empty_value": true
      }
    },
    {
      "remove": {
        "field": "tmp_json",
        "ignore_missing": true
      }
    },
    {
      "reroute": {
        "dataset": [
          "azure.signinlogs"
        ],
        "if": "ctx.routing_category == \"SignInLogs\" || ctx.routing_category == \"NonInteractiveUserSignInLogs\" || ctx.routing_category == \"ServicePrincipalSignInLogs\" || ctx.routing_category == \"ManagedIdentitySignInLogs\""
      }
    },
    {
      "reroute": {
        "dataset": [
          "azure.identity_protection"
        ],
        "if": "ctx.routing_category == \"RiskyUsers\" || ctx.routing_category == \"UserRiskEvents\""
      }
    },
    {
      "reroute": {
        "dataset": [
          "azure.provisioning"
        ],
        "if": "ctx.routing_category == \"ProvisioningLogs\""
      }
    },
    {
      "reroute": {
        "dataset": [
          "azure.auditlogs"
        ],
        "if": "ctx.routing_category == \"AuditLogs\""
      }
    },
    {
      "reroute": {
        "dataset": [
          "azure.activitylogs"
        ],
        "if": "ctx.routing_category == \"Administrative\" || ctx.routing_category == \"Security\" || ctx.routing_category == \"ServiceHealth\" || ctx.routing_category == \"Alert\" || ctx.routing_category == \"Recommendation\" || ctx.routing_category == \"Policy\" || ctx.routing_category == \"Autoscale\" || ctx.routing_category == \"ResourceHealth\""
      }
    },
    {
      "reroute": {
        "dataset": [
          "azure.graphactivitylogs"
        ],
        "if": "ctx.routing_category == \"MicrosoftGraphActivityLogs\""
      }
    },
    {
      "reroute": {
        "dataset": [
          "azure.firewall_logs"
        ],
        "if": "ctx.routing_category == \"AzureFirewallApplicationRule\" || ctx.routing_category == \"AzureFirewallNetworkRule\" || ctx.routing_category == \"AzureFirewallDnsProxy\" || ctx.routing_category == \"AZFWApplicationRule\" || ctx.routing_category == \"AZFWNetworkRule\" || ctx.routing_category == \"AZFWNatRule\" || ctx.routing_category == \"AZFWDnsQuery\""
      }
    },
    {
      "reroute": {
        "dataset": [
          "azure.application_gateway"
        ],
        "if": "ctx.routing_category == \"ApplicationGatewayFirewallLog\" || ctx.routing_category == \"ApplicationGatewayAccessLog\""
      }
    }
  ]
}
nicpenning commented 2 weeks ago

That seems clean to me.

FYI : Your second diagram is showing as missing. image

zmoog commented 2 weeks ago

FYI : Your second diagram is showing as missing.

Ouch, I probably copied and pasted an expiring URL from GitHub. Checking!

zmoog commented 2 weeks ago

It should be fixed now.

nicpenning commented 6 days ago

How does this model work if you wanted more than 1 agent for redundancy and improved performance?

zmoog commented 4 days ago

How does this model work if you wanted more than 1 agent for redundancy and improved performance?

Good question! I should update the note to add this detail.

Here is a diagram showing how the two inputs work together to achieve improved redundancy and performance.

CleanShot 2024-09-09 at 10 57 08

Users set up diagnostic settings, sending data to an event hub (1). The two (or more) inputs start and claim an equal part of partitions. With a four-partition event hub, two inputs usually get two partitions each. Each input processes messages and sends them to the data stream in Elasticsearch.

The routing (2) happens on Elasticsearch at the data stream level, so it works with one or multiple event hubs.

nicpenning commented 4 days ago

This sounds great. Unfortunately the graphic won't load for me.

nicpenning commented 4 days ago

I can zoom in here, looks awesome!

zmoog commented 3 days ago

I can zoom in here, looks awesome!

Yeah, the GitHub images URL expires quickly. I usually reload the page and click on the image to get the whole picture. Let me know if you have difficulties in opening it.

nicpenning commented 3 days ago

Works great now.