grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.61k stars 3.41k forks source link

Windows EventLog support #1395

Closed tdabasinskas closed 3 years ago

tdabasinskas commented 4 years ago

Is your feature request related to a problem? Please describe. Windows logs are stored in Event Log (.evtx files), which currently not possible to scrape it via currently available promtail methods.

Describe the solution you'd like Since we do have systemd journal support for Linux, it would be nice to have support for Event Log on Windows in a similar matter.

Describe alternatives you've considered Key part of the solution is actually being able to parse the logs. If I haven't missed anything, it seems that there are currently two Golang modules that can do that: github.com/0xrawsec/golang-evtx and github.com/elastic/beats/winlogbeat/eventlog.

randomchance commented 4 years ago

This would instantly make Loki viable in my environment - fluentd requires ruby, which is no-go, but having a single go executable would be perfect.

Alternatively, being able to accept data from winlogbeat (or beats in general!) and run it through the pipline would be amazing!

PWSys commented 4 years ago

+1 I have many Windows systems in my environment.
randomchance's suggestion of leveraging winlogbeat would work as well!

steenstra commented 4 years ago

This would be great!

I'm currently using InfluxDB/Telegraf as Syslog receiver with NXlog (https://nxlog.co/) to convert Windows Event logs to Syslog, using the im_msvistalog module.

I see that Promtail can be used as Syslog target (https://github.com/grafana/loki/blob/master/docs/clients/promtail/scraping.md#syslog-target), so maybe something like that would be a temporary solution until this is implemented?

cosmo0920 commented 4 years ago

Another alternative is using Fluentd's Windows EventLog plugin.

Fluentd ecosystem has fluent-plugin-windows-eventlog's in_windows_eventlog2 plugin which can consume .evtx format Windows EventLog. But this workaround requires Ruby as randomchance's suggested.

pomazanbohdan commented 4 years ago

receiver with NXlog (https://nxlog.co/) to convert Windows Event logs to Syslog

If you convert Windows events to a log, then it can already be sent to loki via promtail

randomchance commented 4 years ago

Just wanted to let interested parties know - winlogbeat (and all the elastic beats) can be configured to output to rolling files instead of logstash, so you can scrape them with promtail!

cyriltovena commented 4 years ago

We're going to add logstash soon, this is nice to know for windows users.

azawawi commented 4 years ago

I am currently working on this one (i.e a golang prototype to get windows event logs directly to promtail). So far the executable size is ~2.5MB. Your feedback is appreciated for the following:

Kindly provide example production workload numbers. Your feedback is appreciated :smile:

azawawi commented 4 years ago

And here is the result so far (Promtail / Loki / Windows Event Log prototype on Windows 10):

Please note there is a usability bug in Grafana's explore / query. To get past the error parse error at line 1, col 11: invalid char escape, you need to escape \ with \\.

Screenshot 2020-04-25 125415

randomchance commented 4 years ago

That's awesome, thanks so much for working on this!

As for feedback:

  • It supports the following log names:

I think this will need to be able to support arbitrary log names, just try all the configured ones and ignore ones that fail. It's pretty standard practice for enterprise applications to create their own logs that need to be monitored, even ignoring all of the other Microsoft ones. Here are a just a few of the ones I would need to monitor:

You can see the list of ones available in powershell by running:

Get-WinEvent -ListLog *

My desktop has 510 logs registered!

Should we take Source + EventID + Timestamp to provide tailing?

As for tailing, I would think that using logname + source+timestamp or even just logname+timestamp would be ideal, the Event ID's are often repeated or reused, especially by non-Microsoft sources.

it is writing to win-event.log

This would need to be rotated and cleaned up - my personal favorite method to rotate logs that are being consumed is to include the date in the file name, such as always writing to logname.{day}{month}{year}.log and have a configured limit on the number of files retained - this way you are not changing the names of files promtail should read, and you can easily say "Keep 30 days of data". Also, if you don't want to handle cleaning the old files up it's easier for me to have a script that just deletes files that match a pattern with a LastWriteTime older than X days.

in the current directory

This would need to be configurable. The executable will generally be in the program files directory which will require admin access to write to, and for some of our servers I would want to move it to another drive for space considerations - the logs easily reach a couple of gigs a month, and we need to retain them for regulatory reasons.

Thanks again, and I'm happy to provide feedback!

cyriltovena commented 4 years ago

From what I understand you're listening to event and writing them to a temporary files. This is nice, although alternatively later we could read directly from promtail with a new windows target.

Really thank you for contributing this is awesome ! and @azawawi join #loki-dev if you need anything.

azawawi commented 4 years ago

@randomchance Thanks for the useful info and feedback. I really appreciate it.

@cyriltovena Yes that sums it up. So from I understand, I need to add a new windows-only build target (i.e. wineventtarget.go, wineventmanager.go and winevent_test.go) inside the targets folder to implement it. Anything else I missed?

randomchance commented 4 years ago

@azawawi If it's possible, I agree with @cyriltovena that it would be better to send directly to promtail, technically you can already use Winlogbeat to write eventlogs directly to files.

Thanks again!

randomchance commented 4 years ago

@azawawi You asked for example production workload numbers, so I got some for you!

I just checked one of our installations were we batch logs for retention.

Interval Entries Size
1 Hour (Busy) 140966 69.1 Mb
1 Hour (Calm) 9236 6.5 Mb

... and I just realized this is an aggregate number so not super applicable, but maybe if you divide it by the 9 servers?

cyriltovena commented 4 years ago

@cyriltovena Yes that sums it up. So from I understand, I need to add a new windows-only build target (i.e. wineventtarget.go, wineventmanager.go and winevent_test.go) inside the targets folder to implement it. Anything else I missed?

No that’s the idea you got it right, we can help you of course along the way.

randomchance commented 4 years ago

Since the eventlog API supports xpath queries, I think that would be a good low hanging fruit for any solution.

@azawawi I did some digging into how DotNet handles persisting the last read location in the EventLog stream. DotNet has an EventBookMark class that you use, but under the hood that is just storing the channel and RecordID.

This means that if you use an XPath query, you can filter it with something like: Event[System[EventRecordID > 83005]]

Where 83005 is the RecordID of the last record stored.

This query gets records after record 83005 and older than 86400000 milliseconds Event[System[EventRecordID > 83005 and TimeCreated[timediff(@SystemTime) <= 86400000]]]

Or if you want to get the specific event you left off on: Event[System[EventRecordID = 83005]]

I think it would be smart to look for the single event first, and if it's no longer there you can assume the the log has dropped it and just start at the beginning, or also store the timestamp and fall back to it.

I don't know if that's helpful, but I was looking into writing something similar in DotNet Core and this was stumping me for a while.

You can test the filters/queries in PowerShell pretty easily:

$query = "Event[System[EventRecordID > 83005  and TimeCreated[timediff(@SystemTime) <= 86400000]]]"
Get-WinEvent -FilterXPath $query -LogName System
Jacq commented 4 years ago

I hope to hear more about this, I am currently using winlogbeat with elasticsearch, but if the use of loki for these event logs is possible I could remove the elasticsearch instance and save cpu resources. I could not find a similar solution with telegraf, it might be in the same situation as loki: https://github.com/influxdata/telegraf/issues/4525 Cheers, Jacq

Jacq commented 4 years ago

This would instantly make Loki viable in my environment - fluentd requires ruby, which is no-go, but having a single go executable would be perfect.

Alternatively, being able to accept data from winlogbeat (or beats in general!) and run it through the pipline would be amazing!

Why not using fluentbin input winlogevent plugin https://docs.fluentbit.io/manual/pipeline/inputs/windows-event-log ? ,and the grafana loki output plugin https://github.com/grafana/loki/tree/master/cmd/fluent-bit No ruby is required for fluentbit which relies on C and Go for the loki plugin, both combined are below 30MB.

I have both winlogbeat (which is amazing) and the above config with fluent bit and seems to work, only caveat is that I have yet to include eventlog level information (info,error...) in the loki labels.

cosmo0920 commented 4 years ago

fluent-bit winlogevent plugin does not support to retrieve eventlog's description which should be supported with <winevt.h> API but fluent-bit winlogevent plugin does not use the new Windows EventLog API.

simnv commented 4 years ago

I could not find a similar solution with telegraf, it might be in the same situation as loki: influxdata/telegraf#4525

@Jacq Check influxdata/telegraf#8000

secustor commented 4 years ago

@Jacq Can you share how you have built the Loki FluentD-bit plugin?
I get errors in the syscall package when I'm trying to compile with 'windows/amd64' as target platform. Thanks!

Ulfy commented 4 years ago

I'm also in need of a Loki compliant log shipper for Windows Event Logs. I'm currently trying to get Fluentbit working, since it has a winlog plugin. I'm not sure if I can build a Loki plugin using /cmd/fluent-bit. It just creates a file for *nix usage, not a Windows DLL/library. Is there a way to target a different arch for the file output?

JacoboDominguez commented 4 years ago

@Jacq Can you share how you have built the Loki FluentD-bit plugin? I get errors in the syscall package when I'm trying to compile with 'windows/amd64' as target platform. Thanks!

I have to check, I think I tried to build also but finally grab the binary from the online repo.

randomchance commented 4 years ago

As far as I can tell, all of the fluent options only support the older style logs, not the newer "channels" such as Microsoft-Windows-DiskDiagnostic/Operational which ended up being a deal breaker for me.

Right now I'm using winlogbeat => logstash => loki and while I like winlogbeat, I really dislike running logstash on windows.

cosmo0920 commented 4 years ago

As far as I can tell, all of the fluent options only support the older style logs, not the newer "channels" such as Microsoft-Windows-DiskDiagnostic/Operational which ended up being a deal breaker for me.

AFAIK, the newer Windows EventLog should be retrieved with <winevt.h> API.

secustor commented 4 years ago

I have found the online repo @JacoboDominguez is referencing.
It is the repo which has been used before the plugin has been adopted by the Grafana team. https://github.com/cosmo0920/fluent-bit-go-loki

Further I have opened an issue to supply the fluent-bit loki plugin for windows as binary or add a make command for this. #2563

carwyn commented 4 years ago

@azawawi are you able to share the code for the work you are doing? We have many Windows servers I can potentially test this on.

Ulfy commented 4 years ago

Right now I'm using winlogbeat => logstash => loki and while I like winlogbeat, I really dislike running logstash on windows.

@randomchance what kind of mutations are you doing to add labels for loki and drop high cardinality ones? I'm configuring this rn, but the output from winlogbeat is massive...

randomchance commented 4 years ago

@Ulfy sorry for the delay - I'm really only doing a couple of things, and taking advantage of the fact that top level fields with multiple values are dropped.

A quick summary:

  1. I only add three labels right now (but that may change) Instance,Level and Weight - Weight {low|audit|normal|high} is just a measure of the importance of the log or source the entry is from and makes surfacing problems faster. I plan to have alerts triggered by high weight error level entries.
  2. I create a header of name=value pairs such as event_id=1753 to make filtering easier and add it as the first line of the message.
  3. Metadata about the entry that I want to save is serialized to JSON and added as the last line of the message.
  4. Other top level fields I rename to something like [loki][temp][field_name] so they are discarded when sent to Loki, but still show up if I send them to a file while debugging.

The guidelines for loki stress how important it is to not add a ton of labels, and point out that the regex filtering is very powerful. After creating a header to filter on I can say that the search time and low storage cost is impressive, however there are serious pain points.

It's possible that sending logs to promtail first and generating metrics there will help address some of the issues I've met, but I'm probably going to need to add some indication of the log channel as a label, though I'm not sure what that will look like yet.

calebcoverdale commented 4 years ago

@randomchance would you be willing to share your config file for Logstash to Loki? I got winlogbeat to talk to Logstash, now I am a bit lost in actually formatting the data to get into Loki.

randomchance commented 4 years ago

@calebcoverdale I can't share the my full config, but I have mocked up something similar. One thing I can't stress enough is to double check your string quotation - the logstash config language is frankly horrible when it comes to rules on quoting identifiers.

Here is a excerpt from a lessons learned KB I put together for my team:

See Accessing Event Data and Fields in the Configuration and Field References Deep Dive

Field names (object properties) MUST NOT be quoted in conditionals, but MUST in other situations - I'm explicitly not going to specify where because you should go re-read the documentation. Again. Every time something does not work as expected.

The string handling in configuration files alone will, eventually, go a long way towards either convincing you to develop test input that fully exercises any conditionals in your configuration, or convincing you to >find another log processing solution, if you can.

Here is full example config, I tried to add explanatory comments:


input{
    beats { 
        port => 5044 
        add_field => {       
            "[loki][header]"  => " "      
            "[weight]"  => "unknown" 
            "[loki][save][agent-version]" => "%{[agent][version]}"
        }
    }
}

filter{
# a lot of the channels have events that just say this :( 
    if  [message] =~ "/^For internal use/" {
        drop{}
    }

  if [agent][type] == "winlogbeat" {

      # create the top level field [level], which becomes a label.
      # create the field [loki] and the nested field [level] in case I want to use it later
    mutate {
      add_field => {
        "[loki][level]" => "%{[log][level]}"
        "[level]" => "%{[log][level]}"
        }
    }

    # If the event is from the "classic" event logs
    # I use the provider name as the "source" for the event and store it in [loki][identifier]
    if  [winlog][channel] =~ "(System|Application|Security|Setup)" {
      mutate {
        add_field => {
          "[loki][identifier]" => "%{[winlog][provider_name]}"
        }
        add_tag => ["classic_log","%{[winlog][provider_name]}"]
      }
    }else {
      mutate {
        add_field => {
        # The newer log channels are generaly single application/system so I use the channel name for those
          "[loki][identifier]" => "%{[winlog][channel]}"
        }
        add_tag => ["channel","%{[winlog][channel]}"]
      }
    }

    #################################################################
    # now we can set the weight 
    #################################################################

    # this uses the translate function to pull a weight value from an external json file
    # the if none of the keys match fallback to "normal"
    translate {
        field => "[loki][identifier]"
        destination => "[loki][weight]"
        regex => true
        dictionary_path => "D:/logstash/config/log-weights.json"
        fallback => "normal"    
        # I add tags everywhere so I can follow the event progression while debugging
        add_tag => [ "sys_translate_provider_weight" ] 
      }

    #################################################################
    # Now set the category - audits get special treatment first
    #################################################################

    # add an Audit line to security audits and the audit header

    if [winlog][provider_name] =~ "Security-Auditing" {
      mutate{
      replace => { 
        "[loki][header]" => " \n\t type=%{[event][type]} category=%{[event][category]} action=%{[event][action]} outcome=%{[event][outcome]} %{[loki][header]}"  
        "[loki][weight]" => "audit"
        "[loki][category]" => "%{[event][category]}-audit"
        }
        # these fields only exists on audit events
        add_field  => {
          "[loki][event_data][logon]" => "%{[winlog][logon]}"
          "[loki][event_data][keywords]" => "%{[winlog][keywords]}"
          }
          add_tag => [ "sec-audit" ]
      }
    }  else { 
      # having the same field and source means a replace is done, otherwise it only seems to work on new fields
      # correction, replace does not seem to work, contrary to the docs (at least on windows)
      translate {
        field => "[loki][identifier]"
        destination => "[loki][category]"
        dictionary_path => "D:/logstash/config/category-mapping.json"
        fallback => "windows"
        add_tag => [ "channel_translate" ]
      }
    }

    # clean up the channel and provider names
    mutate {
      # provider can have spaces, so replace them
            gsub => [
            # replace all spaces with minus
          "[winlog][provider_name]", "\s", "-"
          "[winlog][channel]", "\s", "-"
          ]
    }

# I'm mostly building a header string here

    # Add the channel and provider to the front of the [loki][header], creating it if it's not there.
    mutate {
      replace => {
        "[loki][header]" => "channel=%{[winlog][channel]} provider=%{[winlog][provider_name]} %{[loki][header]}"
        }
    }

    if [host][ip] {
        mutate {   
            rename => {"[host][ip]" => "[loki][event_data][host_ip]"}    
        }
    }

    if [host][mac] {
        mutate {   
            rename => {"[host][mac]" => "[loki][event_data][host_mac]"}    
        }
    }

    # We don't want a User label, so rename it, I'm storing it in a subfield of [loki] called [event_data]
    # I'm going to save [event_data] later
    if [user] {
        mutate {
            rename => {"[user]" => "[loki][event_data][user]"}
        }
    }

    # reboot events get flagged as critical
    if "[winlog][event_id]" {
        mutate {   
            replace => {
              #"message" => "event_id=%{[winlog][event_id]} %{[message]}"
              "[loki][header]" => "event_id=%{[winlog][event_id]} %{[loki][header]}"
              }
            add_field  => {"[loki][event_data][event_id]" => "%{[winlog][event_id]}"}    
        }

         if [winlog][provider_name] =~  "Kernel-(General|Power|Boot)" and  [winlog][event_id] =~ "(12|13|109)" {
          mutate {
            replace => {
            "[loki][category] "=> "boot"
            "[loki][weight]"=> "high"
            }

          }
         }
    }
  }  
  # end if-winlogbeat

# this is where I populate the [event_data] with any info I might want later

  if [host][ip] {
      mutate {   
          rename => {"[host][ip]" => "[loki][event_data][host_ip]"}    
      }
  }

  if [host][mac] {
      mutate {   
          rename => {"[host][mac]" => "[loki][event_data][host_mac]"}    
      }
  }
  if [agent][hostname] {
      mutate {   
          rename => {"[agent][hostname]" => "[loki][event_data][computer_name]"}    
      }
  }

# I use the [instance] (which is the computer name) as a label, 
# and it's nice to have consistent casing.
    if [instance] {
      mutate {   
            capitalize => ["[instance]"]  
      }
  }

# might as well keep the oridginal event data

  if [winlog][event_data] {
      mutate {   
          rename => {"[winlog][event_data]" => "[loki][event_data][event_data]"}    
      }
  }

# Add the category to the header

 mutate {
    replace => {
      #"message" => "level=%{[log][level]} %{[message]}"
     "[loki][header]" => "category=%{[loki][category]} %{[loki][header]}"
      }
  }

# Add the level to the header
# this is last so it's first on the log line.

  if [log][level] {
      mutate {   
        #add_field  => {[loki][event_data][level] => "%{[log][level]}"}              
        replace => {
          #"message" => "level=%{[log][level]} %{[message]}"
          "[loki][header]" => "level=%{[log][level]} %{[loki][header]}"
          }    
      }
  }

  mutate {   
    # set the weight to the top level field so it will be a label
    # this replaces / creates [weight] with the content of [loki][weight]
    replace => {
      "weight" =>"%{[loki][weight]}"
    }
  }

  # this serializes the [loki][event_data] object and overwrites it with the json string
  json_encode {
    source => "[loki][event_data]"
  }

   mutate {   
    # add the built header to the message line and push the message to the next line
    # add the event data json string to the end on a new line
    replace => {
      "message" => "%{[loki][header]} \n%{[message]} \n\tEVENTDATA= %{[loki][event_data]}"
    }
    # move the tags array to a sub field so it won't be a label
    # it's not supposed to become one, but I've had some weird things happen
    rename => { "[tags]" => "[temp][tags]"  } 
  }

}
output {

    # I do this to check that the events look like I expect, then swap to the loki output
  file {
    path => "D:\logstash.log"
  }
  # loki {
  #   url => "http://sweet.loki.goodness/loki/api/v1/push"
  # }
}
fifofonix commented 3 years ago

This thread was useful getting a working fluentd->loki setup going for Windows EventLog using in_windows_eventlog2 as @cosmo0920 suggested. This supports ?new? channels.

This is a ruby-based solution but @randomchance this does allow you to specify channels like Microsoft-Windows-DiskDiagnostic/Operational, or even include all channels with a separate option.

When specifying channels in the fluentd config the key is no quoting or escaping which tricked me out initially:

<source>
  @type windows_eventlog2
  @id windows_eventlog2
  # Do not quote "" or escape \ characters in channel names...
  channels application,system,security,HardwareEvents,Windows PowerShell, Microsoft-Windows-Diagnosis-PCW/Operational 
</source>
randomchance commented 3 years ago

@fifofonix That's awesome! The docs do not support that, so I opened an issue to get them updated.

For anyone curious, the documentation says/implies that the standard four logs are the entire set of possible options.

One or more of {'application', 'system', 'setup', 'security'}.

I'm already pretty invested in my current config, but I'll definitely try out a fluentd configuration if I get a chance. Having more options is definitely better.

Now that promtail supports syslog input, using a log shipper that outputs syslog is also an option. If anyone tries that, be aware promtail is geared towards getting logs into loki, it's not nearly as flexible and does not allow the crazy level of processing / editing that you can do in logstash, which is probably a good thing.

danfoxley commented 3 years ago

...and Telegraf now has an input plugin for Windows Event Log

https://github.com/influxdata/telegraf/blob/v1.16.0/plugins/inputs/win_eventlog/README.md

AMoghrabi commented 3 years ago

Hey @randomchance, thanks a lot for your logstash example, I appreciate it. I'm currently setting up something similar for my company and I had a question about the [loki][header] portion -- what does that provide you exactly?

My understanding is Loki tags must be key, value pairs, and the values cannot be nested. When you add a new field such as "[loki][level]", I'm not sure where that gets shown or used in Loki. I'm testing it out right now and when viewing the logs in Loki, I can't see it as a label nor in the logs that are coming from logstash.

I'm fairly new to Logstash so I'm probably misinterpreting this entirely. I'd appreciate your guidance. Thanks!

Edit: Nevermind -- I had some time today to go through the entire config and I see it gets amended to message.

Ulfy commented 3 years ago

...and Telegraf now has an input plugin for Windows Event Log

https://github.com/influxdata/telegraf/blob/v1.16.0/plugins/inputs/win_eventlog/README.md

@danfoxley Does Telegraf have a Loki plugin to output w/ labels? I'm still looking for something to ship windows event logs to Loki...

chancez commented 3 years ago

Having the logs be represented in JSON would potentially be better than XML as Loki 2.0 has the ability to do parsing at query time for high cardinality data, using JSON and Regex, but not XML.

cosmo0920 commented 2 years ago

:bell: Hear ye, hear ye! :bell:

Sorry for commenting on the graveyard. We planned and already implemented using the new Windows EventLog subscribing API on Fluent Bit by this PR: https://github.com/fluent/fluent-bit/pull/4179

This is not required Ruby setup. We can use it by only deploying Fluent Bit executable. Also, our Fluent Bit implementation supported HashMap style consuming.

RamblingCookieMonster commented 1 year ago

Having the logs be represented in JSON would potentially be better than XML as Loki 2.0 has the ability to do parsing at query time for high cardinality data, using JSON and Regex, but not XML.

Hiyo!

I might have missed it, but, am I correct in that Promtail will not convert the XML to json, nor has an XML processor been added to Loki, meaning folks essentially just struggle by and parse the message field with regex when working with Windows event logs?

Cheers!

danfoxley commented 1 year ago

https://grafana.com/docs/agent/latest/static/set-up/install/install-agent-on-windows/

the grafana agent has config for events.

RamblingCookieMonster commented 1 year ago

To clarify, I have no issues at all processing and sending Windows events. This question is about the format of the data. Promtail (which I am testing already, and which grafana agent embeds, afaik, so using it would not change this) does not process the actual event data, and simply sends a string of XML for the event_data. This is... not helpful. I will open another issue if this is the case, but looking to confirm I am not missing anything as I've only spent an afternoon looking at this.

An example of the data in Grafana Cloud:

{
  "source": "Service Control Manager",
  "channel": "System",
  "computer": "REDACTED.REDACTED",
  "event_id": 7036,
  "level": 4,
  "levelText": "Information",
  "keywords": "Classic",
  "timeCreated": "2023-09-07T09:22:18.282632000Z",
  "eventRecordID": 651039,
  "execution": {
    "processId": 812,
    "threadId": 4156,
    "processName": "services.exe"
  },
  "event_data": "<Data Name='param1'>Windows Update</Data><Data Name='param2'>running</Data><Binary>PROBABLY-NO-NEED-TO-REDACT</Binary>",
  "message": "The Windows Update service entered the running state."

So! Windows events derive significant value from the structured data, in this case, under event_data. For example, if these were Active Directory logs, that's where you would find who did what to what principal, among other essential data. That this data is (1) not processed into JSON, and (2) not processable via some XML processor in Grafana, is a significant gap in functionality. I could absolutely be mistaken, but it appears that folks are resorting to regex parsing the meant-for-human-eyes message field, rather than working with structured data, which... seems like going backwards, and may preclude some folks, including me, from considering promtail/loki for Windows.

With that said, before I open an issue that boils down to "this is a terrible experience and for windows event log users, please consider these alternatives," I want to make sure I'm not missing something obvious.

Cheers!

danfoxley commented 1 year ago

@RamblingCookieMonster Does this blog post and related Youtube video add any new insights for this?

https://grafana.com/blog/2021/08/09/new-in-loki-2.3-logql-pattern-parser-makes-it-easier-to-extract-data-from-unstructured-logs/

https://www.youtube.com/watch?v=zIdEVNA6YTI

RamblingCookieMonster commented 1 year ago

That probably would help! To be honest though, there's such a wide variety to what could be included in a field that it's probably not viable outside of one-off solutions to use a parser like that post-ingest, at least, IMHO. Here's a synopsis of what I've found, keeping in mind this is mostly superficial level time/effort, so take with a grain of salt:

What I want: (1) structured windows event data, (2) with parsed event data field names, (3) in a format loki can process

Ultimately, this bit of processing, at least superficially, gets telegraf output into a format Loki / logfmt will be happy with. Haven't tested it much, there might be other characters/sequences that break logfmt, but so far so good:

  [[processors.strings]]
    # Duct taping OSS that isn't designed for Windows, so... escape something that
    # will later become an escape character and confuse logfmt (for Loki queries)
    [[processors.strings.replace]]
      field = "*"
      old = '\'
      new = '\\'
    # Duct taping OSS that isn't designed for Windows, so... handle the many cases
    # where a field will have a double quote (command line, script content, cron/task definition, etc.)
    # Telegraf sends key="value", and key="value with "quotes"" is not valid for logfmt (for Loki queries)
    [[processors.strings.replace]]
      field = "*"
      old = '"'
      new = '\"'
    # Duct taping OSS that isn't designed for Windows, so... handle the few event data field names
    # that will have spaces in them, as logfmt (for Loki queries) will be quite confused without this.
    [[processors.strings.replace]]
      field_key = "*"
      old = ' '
      new = '_'
    [[processors.strings.tagpass]]
      __name = 'win_eventlog'

It's not as batteries-included as something like elastic or splunk agents, but, it appears that this will be viable. I do think it would be valuable for promtail to be able to meet the needs I mentioned, IMHO it's absolute-bare-minimum functionality for logging in a windows environment, but I can see why it's not a thing yet (if ever).

Cheers!

danfoxley commented 1 year ago

@wardbekker please, for Windows Event Log where EventData comes in as XML

`

0x80073d02 9NMPJ99VJBWV-Microsoft.YourPhone {aa7e4763-ca28-461c-a259-334fb85492b9} 1 {855e8a7c-ecb4-4ca3-b045-1dfa50104289}

`

Is there consideration to using Go encoding/xml package to parse XML?

Considering parsing the ingested XML in LOKI is not available today, what are your thoughts / comments .. use pattern parser? Other?

danfoxley commented 1 year ago

@RamblingCookieMonster Not to drag this on..but:

Filebeat, I guess, can't go straight to Loki. How about using Filebeat to ?? (file, logstash...) then Loki? https://www.elastic.co/guide/en/beats/filebeat/current/decode-xml-wineventlog.html