grafana / alloy

OpenTelemetry Collector distribution with programmable pipelines
https://grafana.com/oss/alloy
Apache License 2.0
1.42k stars 210 forks source link

Alloy Creates Multiple Bookmark Files #1884

Open davideverall opened 1 month ago

davideverall commented 1 month ago

What's wrong?

Grafana Alloy is creating multiple bookmark files in C:\ProgramData\GrafanaLabs\Alloy\data\loki.source.windowsevent.<NAME>

I would expect these files to merge into one bookmark.xml file but in some cases, these are not merging and an old temporary bookmark file remains on disk.

Image

This is possibly the cause of entry too far behind errors we are seeing where logs with old timestamps are being shipped to Loki but rejected for being received out of order/too far behind.

Steps to reproduce

  1. Run Grafana Alloy 1.4.2 on Windows Server 2019 with the specified config.
  2. Observe the bookmarks directory for stale temporary bookmark files
  3. Observer the Windows Application Events for entry too far behind errors

System information

Windows Server 2019

Software version

Grafana Alloy 1.4.2

Configuration

declare "tnpvariables" { export "shortname" { value = "event-collector.corp" } }

tnpvariables {}

loki.source.windowsevent "windows_application" { eventlog_name = "Application" xpath_query = "" poll_interval = "0s" use_incoming_timestamp = true forward_to = [loki.write.default.receiver] labels = { job = "windows-events", shortname = tnpvariables.shortname, } legacy_bookmark_path = "./bookmarks/bookmark-application.xml" }

loki.source.windowsevent "windows_system" { eventlog_name = "System" xpath_query = "" poll_interval = "0s" use_incoming_timestamp = true forward_to = [loki.write.default.receiver] labels = { job = "windows-events", shortname = tnpvariables.shortname, } legacy_bookmark_path = "./bookmarks/bookmark-system.xml" }

loki.source.windowsevent "windows_security" { eventlog_name = "Security" xpath_query = "" poll_interval = "0s" use_incoming_timestamp = true forward_to = [loki.write.default.receiver] labels = { job = "windows-events", shortname = tnpvariables.shortname, } legacy_bookmark_path = "./bookmarks/bookmark-security.xml" }

loki.source.windowsevent "windows_scheduledtasks" { eventlog_name = "Microsoft-Windows-TaskScheduler/Operational" xpath_query = "" poll_interval = "0s" use_incoming_timestamp = true forward_to = [loki.write.default.receiver] labels = { job = "windows-events", shortname = tnpvariables.shortname, } legacy_bookmark_path = "./bookmarks/bookmark-scheduledtasks.xml" }

loki.source.windowsevent "windows_scriptlog" { eventlog_name = "ScriptLog" xpath_query = "" poll_interval = "0s" use_incoming_timestamp = true forward_to = [loki.write.default.receiver] labels = { job = "windows-events", shortname = tnpvariables.shortname, } legacy_bookmark_path = "./bookmarks/bookmark-scriptlog.xml" }

loki.write "default" { endpoint { url = "https:///loki/api/v1/push" max_backoff_retries = 60 } external_labels = { type = "server", } } loki.source.windowsevent "windows_nontech" { eventlog_name = "Wec-Clients/Nontech" xpath_query = "" poll_interval = "0s" use_incoming_timestamp = true forward_to = [loki.write.default.receiver] labels = { job = "windows-events", type = "workstation-nontech", } legacy_bookmark_path = "./bookmarks/bookmark-windows_nontech.xml" } loki.source.windowsevent "windows_tech" { eventlog_name = "Wec-Clients/Tech" xpath_query = "" poll_interval = "0s" use_incoming_timestamp = true forward_to = [loki.write.default.receiver] labels = { job = "windows-events", type = "workstation-tech", } legacy_bookmark_path = "./bookmarks/bookmark-windows_tech.xml" }

Logs

ts=2024-10-14T13:07:33.3994426Z level=error msg="final error sending batch" component_path=/ component_id=loki.write.default component=client host= status=400 tenant="" error="server returned HTTP status 400 Bad Request (400): entry with timestamp 2024-10-11 15:44:53.4083276 +0000 UTC ignored, reason: 'entry too far behind, entry timestamp is: 2024-10-11T15:44:53Z, oldest acceptable timestamp is: 2024-10-14T11:03:01Z',"

github-actions[bot] commented 2 days ago

This issue has not had any activity in the past 30 days, so the needs-attention label has been added to it. If the opened issue is a bug, check to see if a newer release fixed your issue. If it is no longer relevant, please feel free to close this issue. The needs-attention label signals to maintainers that something has fallen through the cracks. No action is needed by you; your issue will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity. Thank you for your contributions!