Closed zmoog closed 8 months ago
Bootstrap a test cluster using elastic-package
and install the Azure Logs integration
cd packages/azure
# Using the current dev version, no need to pick this one, you can use the latest release.
elastic-package build && elastic-package stack up -d -v --version 8.13.0-SNAPSHOT
You can use use placeholder for all the integration settings, we are not going to assign this policy to an agent.
We can give the reroute processor a try by adding a custom ingest pipeline.
For this test, we we'll use a trivial routing condition to route documents to the prod
namespace if ctx.env == "prod"
:
# request
POST _ingest/pipeline/logs-azure.eventhub-1.9.0/_simulate
{
"docs": [
{
"_index": "logs-azure.eventhub-default",
"_source": {
"env": "dev"
}
}
]
}
# response
{
"docs": [
{
"doc": {
"_index": "logs-azure.eventhub-default",
"_version": "-3",
"_id": "_id",
"_source": {
"env": "dev",
"ecs": {
"version": "8.0.0"
},
"event": {
"kind": "event"
}
},
"_ingest": {
"timestamp": "2024-01-09T21:05:03.290911341Z"
}
}
}
]
}
# request
POST _ingest/pipeline/logs-azure.eventhub-1.9.0/_simulate
{
"docs": [
{
"_index": "logs-azure.eventhub-default",
"_source": {
"env": "prod"
}
}
]
}
# response
{
"docs": [
{
"doc": {
"_index": "logs-azure.eventhub-prod",
"_version": "-3",
"_id": "_id",
"_source": {
"env": "prod",
"ecs": {
"version": "8.0.0"
},
"event": {
"kind": "event"
},
"data_stream": {
"namespace": "prod",
"type": "logs",
"dataset": "azure.eventhub"
}
},
"_ingest": {
"timestamp": "2024-01-09T21:05:09.711010386Z"
}
}
}
]
}
# request
POST logs-azure.eventhub-default/_doc
{
"@timestamp": "2024-01-09T21:09:09.000Z",
"env": "prod"
}
# response
{
"_index": ".ds-logs-azure.eventhub-prod-2024.01.09-000001",
"_id": "UfsO8IwBVoLvb9xL8uzc",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}
Query the target data stream to see if the document it's there:
# request
GET logs-azure.eventhub-prod/_search
# response
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": ".ds-logs-azure.eventhub-prod-2024.01.09-000001",
"_id": "UfsO8IwBVoLvb9xL8uzc",
"_score": 1,
"_source": {
"@timestamp": "2024-01-09T21:09:09.000Z",
"env": "prod",
"ecs": {
"version": "8.0.0"
},
"event": {
"agent_id_status": "missing",
"ingested": "2024-01-09T21:09:20Z",
"kind": "event"
},
"data_stream": {
"namespace": "prod",
"type": "logs",
"dataset": "azure.eventhub"
}
}
}
]
}
}
Add a new routing_rules.yml
file at packages/azure/data_stream/eventhub
with the following content:
# Route azure logs events to the correct dataset and namespace
# based on document annotations.
- source_dataset: azure.eventhub
rules:
- target_dataset:
- "{{annotations.dataset}}"
- "{{data_stream.dataset}}"
namespace:
- "{{annotations.namespace}}"
- "{{data_stream.namespace}}"
if: "ctx.annotations != null"
Bump the integration version:
diff --git a/packages/azure/changelog.yml b/packages/azure/changelog.yml
index b7e611cb9..8ea80a91a 100644
--- a/packages/azure/changelog.yml
+++ b/packages/azure/changelog.yml
@@ -1,3 +1,8 @@
+- version: "1.9.0"
+ changes:
+ - description: Add caller_ip_address field in pipeline for Azure sign-in logs.
+ type: enhancement
+ link: https://github.com/elastic/integrations/pull/8813
- version: "1.8.3"
changes:
- description: Add caller_ip_address field in pipeline for Azure sign-in logs.
diff --git a/packages/azure/data_stream/eventhub/manifest.yml b/packages/azure/data_stream/eventhub/manifest.yml
index 2dc49caa7..ac6ca1fa1 100644
--- a/packages/azure/data_stream/eventhub/manifest.yml
+++ b/packages/azure/data_stream/eventhub/manifest.yml
@@ -1,4 +1,5 @@
type: logs
+dataset: azure.eventhub
title: Azure Event Hub Input
streams:
- input: "azure-eventhub"
diff --git a/packages/azure/manifest.yml b/packages/azure/manifest.yml
index 21257e415..3ea0bfb5a 100644
--- a/packages/azure/manifest.yml
+++ b/packages/azure/manifest.yml
@@ -1,6 +1,6 @@
name: azure
title: Azure Logs
-version: 1.8.3
+version: 1.9.0
description: This Elastic integration collects logs from Azure
type: integration
icons:
Finally, install the new integration version:
elastic-package build && elastic-package stack up -d -v --services package-registry
POST logs-azure.eventhub-default/_doc
{
"@timestamp": "2024-01-09T21:09:09.000Z",
"annotations": {
"namespace": "whatever"
}
}
Search for the document:
GET logs-azure.eventhub-whatever/_search
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": ".ds-logs-azure.eventhub-whatever-2024.01.09-000001",
"_id": "s_RJ8IwBJ1mZT2UtkZNf",
"_score": 1,
"_source": {
"annotations": {
"namespace": "whatever"
},
"@timestamp": "2024-01-09T21:09:09.000Z",
"ecs": {
"version": "8.0.0"
},
"event": {
"agent_id_status": "missing",
"ingested": "2024-01-09T22:13:22Z",
"kind": "event"
},
"data_stream": {
"namespace": "whatever",
"type": "logs",
"dataset": "azure.eventhub"
}
}
}
]
}
}
Let's see the ingest pipeline source:
GET _ingest/pipeline/logs-azure.eventhub-1.9.0
As you can see, Fleet added a reroute
processor after the custom pipeline to allow users to execute their custom processors before the rerouting happens:
{
"logs-azure.eventhub-1.9.0": {
"description": "Pipeline for parsing azure activity logs.",
"processors": [
{
"set": {
"field": "ecs.version",
"value": "8.0.0"
}
},
{
"rename": {
"field": "azure",
"target_field": "azure-eventhub",
"ignore_missing": true
}
},
{
"set": {
"field": "event.kind",
"value": "event"
}
},
{
"pipeline": {
"if": "ctx?.tags != null && ctx.tags.contains('parse_message')",
"name": "logs-azure.eventhub-1.9.0-parsed-message"
}
},
{
"pipeline": {
"name": "global@custom",
"ignore_missing_pipeline": true
}
},
{
"pipeline": {
"name": "logs@custom",
"ignore_missing_pipeline": true
}
},
{
"pipeline": {
"name": "logs-azure@custom",
"ignore_missing_pipeline": true
}
},
{
"pipeline": {
"name": "logs-azure.eventhub@custom",
"ignore_missing_pipeline": true
}
},
{
"reroute": {
"tag": "azure.eventhub",
"dataset": [
"{{annotations.dataset}}",
"{{data_stream.dataset}}"
],
"namespace": [
"{{annotations.namespace}}",
"{{data_stream.namespace}}"
],
"if": "ctx.annotations != null"
}
}
],
"on_failure": [
{
"set": {
"field": "error.message",
"value": "{{ _ingest.on_failure_message }}"
}
}
],
"_meta": {
"managed_by": "fleet",
"managed": true,
"package": {
"name": "azure"
}
}
}
}
@zmoog Just curious what happens in the scenario when the routing rule fails?
For example, if the conditional if
is never checked, and annotations
is indeed null
, where do the document end up?
# Route azure logs events to the correct dataset and namespace
# based on document annotations.
- source_dataset: azure.eventhub
rules:
- target_dataset:
- "{{annotations.dataset}}"
- "{{data_stream.dataset}}"
namespace:
- "{{annotations.namespace}}"
- "{{data_stream.namespace}}"
if: "ctx.annotations != null"
The if
condition is required by the routing rules spec, so it's always there.
However, one may check the wrong condition and feed the processor with a null
value. In this case, the reroute processor falls back to the original dataset or namespace.
Continuing with the previous examples, here's what happens:
#
# routing_rules.yml
#
# Route azure logs events to the correct dataset and namespace
# based on document annotations.
- source_dataset: azure.eventhub
rules:
- target_dataset:
- "{{annotations.dataset}}"
- "{{data_stream.dataset}}"
namespace:
- "{{annotations.namespace}}"
- "{{data_stream.namespace}}"
if: "ctx.annotations != null"
#
# request
#
POST logs-azure.eventhub-default/_doc
{
"@timestamp": "2024-01-09T09:41:00.000Z",
"annotations": {
"namespacez": "hey" // the `namespace` is misspelled as `namespacez`
}
}
#
# response
#
{
"_index": ".ds-logs-azure.eventhub-default-2024.01.11-000001", // falls back to the `default` namespace
"_id": "BPWN-IwBJ1mZT2UtjHyP",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 3,
"_primary_term": 1
}
I want to prepare a simple practical example to demonstrate how we can use the reroute processor and routing rules to route documents to a different data stream based on in document data.