Open RamblingCookieMonster opened 1 year ago
Oh dear, I see the example I linked actually turned into a parse-the-message-field implementation. While... that is something, and took time and effort, I want to emphasize that for Windows, that is absolutely not the approach to take, though I totally understand that perhaps not everyone using promtail/Loki has Windows experience..
There are other references, but take this Microsoft provided spreadsheet that focuses solely on the Security
log event IDs, and presumably, a subset as things have changed. Note the Complete Event Messages
sheet. This is illustrating how Windows Events work (there are far better / deeper references, but this is a simply way to illustrate it). For example:
Event ID 4713:
Kerberos policy was changed.
Subject:
Security ID: %1
Account Name: %2
Account Domain: %3
Logon ID: %4
Changes Made:
('--' means no changes, otherwise each change is shown as:
(Parameter Name): (new value) (old value))
%5
%5 would not be captured in this case.
Event ID 4899:
A Certificate Services template was updated.
%1 v%2 (Schema V%3)
%4
%5
Template Change Information:
Old Template Content: %8
New Template Content: %7
Additional Information:
Domain Controller: %6
More data that would not be parsed
Event ID 4624:
An account was successfully logged on.
Subject:
Security ID: %1
Account Name: %2
Account Domain: %3
Logon ID: %4
Logon Type: %9
New Logon:
Security ID: %5
Account Name: %6
Account Domain: %7
Logon ID: %8
Logon GUID: %13
Process Information:
Process ID: %17
Process Name: %18
Network Information:
Workstation Name: %12
Source Network Address: %19
Source Port: %20
Detailed Authentication Information:
Logon Process: %10
Authentication Package: %11
Transited Services: %14
Package Name (NTLM only): %15
Key Length: %16
This event is generated when a logon session is created. It is generated on the computer that was
accessed.
The subject fields indicate the account on the local system which requested the logon. This is most
commonly a service such as the Server service, or a local process such as Winlogon.exe or Services.exe.
The logon type field indicates the kind of logon that occurred. The most common types are 2
(interactive) and 3 (network).
The New Logon fields indicate the account for whom the new logon was created, i.e. the account that was
logged on.
The network fields indicate where a remote logon request originated. Workstation name is not always
available and may be left blank in some cases.
The impersonation level field indicates the extent to which a process in the logon session can
impersonate.
The authentication information fields provide detailed information about this specific logon request.
- Logon GUID is a unique identifier that can be used to correlate this event with a KDC event.
- Transited services indicate which intermediate services have participated in this logon request.
- Package name indicates which sub-protocol was used among the NTLM protocols.
- Key length indicates the length of the generated session key. This will be 0 if no session key was
requested.
So... Maybe the parsing accounted for this, but how would this parse Security ID
and differentiate the subject from the new login? Also, do you see how long that field is with all that text? So in addition to the real event_data
, this massive string is sent for an event ID that is quite, quite common in busy environments.
That data should be in a much more compact set of fields that windows provides, but which is not currently parsed. Here's an example from winlogbeat, which among other agents, parses this data without relying on the Message
field:
"event_data": {
"ProcessName": "C:\\Windows\\System32\\lsass.exe",
"LogonGuid": "{00000000-0000-0000-0000-000000000000}",
"TargetOutboundDomainName": "-",
"VirtualAccount": "%%1843",
"IpPort": "52024",
"TransmittedServices": "-",
"LmPackageName": "-",
"RestrictedAdminMode": "-",
"ElevatedToken": "%%1842",
"WorkstationName": "REDACTED",
"SubjectDomainName": "REDACTED",
"TargetDomainName": "REDACTED",
"LogonProcessName": "Advapi ",
"LogonType": "3",
"SubjectLogonId": "0x3e7",
"KeyLength": "0",
"TargetOutboundUserName": "-",
"TargetLogonId": "0x1a2497c9f",
"TargetLinkedLogonId": "0x0",
"SubjectUserName": "REDACTED$",
"IpAddress": "REDACTED",
"ImpersonationLevel": "%%1833",
"ProcessId": "0x530",
"TargetUserName": "REDACTED",
"SubjectUserSid": "S-1-5-18",
"TargetUserSid": "S-1-5-21-REDACTED",
"AuthenticationPackageName": "MICROSOFT_AUTHENTICATION_PACKAGE_V1_0"
},
Cheers!
Hello, thanks for reporting this.
We're currently reevaluating promtails position as a project within Grafana Labs. Internally we're actually using the Agent for both metrics and logs collection at this point. Additionally, the agent team is more likely to have time to dedicate to this. It's likely a fix would only go into the agent, but if there's an argument for adding a change here in promtail as well that can be discussed.
At the very least, the Agent team is actually going to have people who would have context about Windows in general
@RamblingCookieMonster Would you consider opening this issue with the Grafana Agent team? I am running into the same issue using the Grafana Agent. You've spent the time creating a well crafted issue / feature request, and it would be great if the appropriate team was notified. I could try creating the request there, but it wouldn't be as thorough a post as you have here.
It seems that winlogbeat parses the event_data into separate fields (see https://www.elastic.co/guide/en/beats/winlogbeat/current/exported-fields-winlog.html#_event_data).
My work around may be to have winlogbeat write the windows security events to a text file and then have Grafana Agent read this file and push it to loki. This should work, but greatly complicates the setup.
@mennotech - feel free to borrow from this and/or copy it over! Yeah, we ended up avoiding the write back thing for this and a few other spots it would have been handy (it also puts a bit more pressure on IO/storage, but it does work, good find!). Ultimately, we're going to likely end up using Splunk for this sort of data, so while this is something I would encourage Grafana Labs to implement, it's not something I'll have time to push for. Cheers!
Is your feature request related to a problem? Please describe.
The current implementation of the
windows_events
scraper forpromtail
does not fully parse Windows events into parseable structured data. There are many negative outcomes across various stakeholders; for example:event_data
->prefix_attributename
unrolling (e.g. Data_attributename for telegraf), community efforts rely on data intended for human eyes that should be droppable - See this Sigma parser that (1) relies on regex parsing of a templated field designed for human eyes (Message
) that (2) could break if event schemas changes, and which (3) means folks must decide on whether to drop that field to reduce their resource consumption (the Message field is not critical, it is a template with the actual event data filled in that should not be relied on), or keep the field to enable those Sigma queriesYou can likely imagine other reasons. Parseable structured data is sort of critical in the world of logs, and systems that use their data.
Here's an example of what you produce today via Promtail's windows_events:
Notice the
event_data
. It is not parsed into named fields (in this case,TaskName
,TaskInstanceId
, etc., or preferably with a prefix likeData_TaskName
to avoid collisions, as used by Telegraf), it's an XML-ish string bunched into a single field.This results in folks relying on one-off (not scalable/generalizeable) "solutions" using that XML-y field, or, relying on the
Message
field (again, which should not be relied on) with rather absurd queries like this, from the previously referenced sigma post:Describe the solution you'd like
Parse EventData and UserData please. You likely should do this on the Windows/promtail side of the house. I cannot help you here, but I can at least point out that Telegraf, Winlogbeat, Splunk, and presumably other agents can do this (IMHO) bare-minimum windows event parsing.
for example,
"event_data": "<Data Name='TaskName'>\\REDACTED</Data><Data Name='TaskInstanceId'>{6191c9fe-4655-4af1-bfbe-8d48d51ee41e}</Data><Data Name='ActionName'>C:\\Windows\\SYSTEM32\\cmd.exe</Data><Data Name='ResultCode'>0</Data><Data Name='EnginePID'>4628</Data>"
might expand to:Considerations would need to be made as to escaping
"
and\
within values, I've just written the above by hand so it's not going to be perfect. You might also prefix the keys - e.g.Data_TaskName
,Data_ResultCode
and move them to the root level (or make this an option). Particularly if you want to help the community, who might be relying on Telegraf, which uses that convention (Data_
prefix).This should cover
UserData
as well, on the subset of events with this.Describe alternatives you've considered
promtail
with the currentwindows_events
scraper is a non-starter given that it does not parse event data (e.g. EventData or UserData). This includesGrafana Agent
as that embeds promtail.fluent-bit
makes a start. It parses event data, but it does so into an array, where you would need to reference schema (e.g. index 0 of the event data array for this specific event id and source might be TaskName). This is also a non-starter - if I want to query multiple events forSubjectUserSid
, I want to reference that field, not look up what index it is, possibly across multiple event IDs, resulting in an unreadable query. Please do not go this route.telegraf
works! Albeit with some telegraf processors to ensure the data it sends results in a format thatlogfmt
can process. Example processors here. With that said, I am hopingpromtail
will implement similar functionality (producing structured data with named fields parseable byjson
orlogfmt
on the Loki side)Additional context
Not much. A few references:
If this is just me holding it wrong, please let me know, but after a few days of reading and testing, I'm pretty confident this is indeed not in place. I include it as a "feature", but to me, for a logging solution, this is more a "bug". Thanks!