Open zmoog opened 8 months ago
On a 8.8.2 cluster:
elastic-package build && elastic-package stack up -d -v --version 8.8.2
I set up an agent to collect CloudTrail logs from an S3 bucket. The aws-s3
is configured for polling the bucket.
Here's the agent policy:
- id: aws-s3-cloudtrail-a10aca5c-590b-4643-ad4e-48c6fb65ddbf
name: aws-14
revision: 3
type: aws-s3
use_output: default
meta:
package:
name: aws
version: 1.51.1
data_stream:
namespace: default
package_policy_id: a10aca5c-590b-4643-ad4e-48c6fb65ddbf
streams:
- id: aws-s3-aws.cloudtrail-a10aca5c-590b-4643-ad4e-48c6fb65ddbf
data_stream:
dataset: aws.cloudtrail
type: logs
file_selectors:
- regex: /CloudTrail/
expand_event_list_from_field: Records
- regex: /CloudTrail-Digest/
- regex: /CloudTrail-Insight/
expand_event_list_from_field: Records
access_key_id: <REDACTED>
content_type: application/json
expand_event_list_from_field: Records
secret_access_key: <REDACTED>
max_number_of_messages: 5
tags:
- forwarded
- aws-cloudtrail
publisher_pipeline.disable_host: true
Executing a shell in the agent container:
docker exec -u 0 -it elastic-package-stack-elastic-agent-1 /bin/bash
And install some basic tools:
apt install jq tree
Search and inspect the registry persistence store on the file system:
# search for registry folder
$ find . -iname registry
./state/data/run/filestream-monitoring/registry
./state/data/run/aws-s3-default/registry
./.node/node/lib/node_modules/@elastic/synthetics/node_modules/playwright-core/lib/server/registry
# inspecting ./state/data/run/aws-s3-default/registry
$ tree ./state/data/run/aws-s3-default/registry
./state/data/run/aws-s3-default/registry
`-- filebeat
|-- 1332300.json
|-- active.dat
|-- log.json
`-- meta.json
1 directory, 4 files
$ du -sh ./state/data/run/aws-s3-default/registry/filebeat/*
22M ./state/data/run/aws-s3-default/registry/filebeat/1332300.json
4.0K ./state/data/run/aws-s3-default/registry/filebeat/active.dat
7.9M ./state/data/run/aws-s3-default/registry/filebeat/log.json
4.0K ./state/data/run/aws-s3-default/registry/filebeat/meta.json
# what's inside the bigger file?
$ cat ./state/data/run/aws-s3-default/registry/filebeat/1332300.json | jq | more
[
{
"_key": "filebeat::aws-s3::state::<REDACTED>",
"id": "<REDACTED>",
"bucket": "<REDACTED>2",
"key": "AWSLogs/<REDACTED>/CloudTrail/<REDACTED>",
"etag": "\"b8570636942919ae3b7c0c693c78ceee\"",
"last_modified": [
281470681743360,
1696581008
],
"list_prefix": "",
"stored": true,
"error": false
},
# How many elements are there in the list?
$ cat ./state/data/run/aws-s3-default/registry/filebeat/1332300.json | jq '. | length'
25092
# So we have 25092 keys in the registry, probably one entry for each S3 object processed by Filebeat
Annotations:
The aws-s3-default/registry/filebeat/log.json
file is updated regularly.
If I run:
$ tail -f ./state/data/run/aws-s3-default/registry/filebeat/log.json
{"op":"set","id":1344226}
{"k":"filebeat::aws-s3::writeCommit::<REDACTED>-aws-cloudtrail-logs-<REDACTED>","v":{"time":[281470681743360,1697011044]}}
I can see new content added regularly.
The file 1332300.json
last update was three hours ago.
$ ls -ltr ./state/data/run/aws-s3-default/registry/filebeat/
total 31896
-rw------- 1 elastic-agent elastic-agent 15 Oct 10 23:25 meta.json
-rw------- 1 elastic-agent elastic-agent 22393719 Oct 11 05:13 1332300.json
-rw------- 1 elastic-agent elastic-agent 85 Oct 11 05:13 active.dat
-rw------- 1 elastic-agent elastic-agent 10244757 Oct 11 08:02 log.json
And here's the content of active.dat
:
$ cat ./state/data/run/aws-s3-default/registry/filebeat/active.dat
/usr/share/elastic-agent/state/data/run/aws-s3-default/registry/filebeat/1332300.json
I am investigating a user problem with Filebeat re-processing the same S3 objects file after a restart.
I suspect this may happen because it didn't track properly the state for each S3 object, so I want to learn how and where the registry stores its content.