grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.95k stars 3.45k forks source link

CEF (Common Event Format) parser in logql #5648

Open james-callahan opened 2 years ago

james-callahan commented 2 years ago

Is your feature request related to a problem? Please describe. CEF (Common Event Format) is a logging format used by some tools. Loki and promtail should have a built in parser for it so that I can extract CEF fields as labels. CEF messages look something like:

CEF:0|Palo Alto Networks|Cortex XDR|Cortex XDR 2.4|XDR Analytics|High Connection Rate|6|end=1601792870694 shost=WGHRAMG deviceFacility=None cat=Discovery externalId=98106342 request=https:\/\/iga-bh.xdr.eu.paloaltonetworks.com\/alerts\/98106342 cs1=iexplore.exe cs1Label=Initiated by cs2=\“C:\\\\Program Files (x86)\\\\Internet Explorer\\\\IEXPLORE.EXE\” SCODEF:11844 CREDAT:82946 \/prefetch:2 cs2Label=Initiator CMD cs3=Microsoft CorporationSIGNATURE_SIGNED- cs3Label=Signature cs4=iexplore.exe cs4Label=CGO name cs5=\“C:\\\\Program Files (x86)\\\\Internet Explorer\\\\IEXPLORE.EXE\” SCODEF:11844 CREDAT:82946 \/prefetch:2 cs5Label=CGO CMD cs6=Microsoft CorporationSIGNATURE_SIGNED- cs6Label=CGO Signature dst=10.12.4.37 dpt=8000 src=10.10.28.140 spt=58003 fileHash=e582676ec900249b408ab4e37976ae8c443635a7da77755daf6f896a172856a3 filePath=C:\\\\Program Files (x86)\\\\Internet Explorer\\\\iexplore.exe targetprocesssignature=NoneSIGNATURE_UNAVAILABLE- tenantname=iGA tenantCDLid=1021319191 CSPaccountname=Information & eGovernment Authority initiatorSha256=e582676ec900249b408ab4e37976ae8c443635a7da77755daf6f896a172856a3 initiatorPath=C:\\\\Program Files (x86)\\\\Internet Explorer\\\\iexplore.exe cgoSha256=e582676ec900249b408ab4e37976ae8c443635a7da77755daf6f896a172856a3 osParentName=iexplore.exe osParentCmd=\“C:\\\\Program Files (x86)\\\\Internet Explorer\\\\IEXPLORE.EXE\” SCODEF:11844 CREDAT:82946 \/prefetch:2 osParentSha256=e582676ec900249b408ab4e37976ae8c443635a7da77755daf6f896a172856a3 osParentSignature=SIGNATURE_SIGNED osParentSigner=Microsoft Corporation incident=118719 act=Detected suser=['root']

i.e. a pipe delimited format; where the last field is a series of key/value pairs with certain escaping rules

Describe the solution you'd like

Similar to LogQL parsers json and logfmt, we should have a parser cef

Additional context

Documentation of the format

My first use-case is to parse logs from Cortex XDR which are sent as CEF payloads over syslog (see https://docs.paloaltonetworks.com/cortex/cortex-xdr/cortex-xdr-pro-admin/logs/cortex-xdr-log-notification-formats/alert-notification-format.html)

stale[bot] commented 2 years ago

Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot.

james-callahan commented 2 years ago

Please label revivable?

chaudum commented 2 years ago

still valid

chaudum commented 2 years ago

The basic structure of the CEF format can be parsed using the pattern parser, like this: patternCEF:|||||||`

Example:

{filename="/var/log/cef.log"} |= "cat=Discovery" | pattern `CEF:<v>|<vendor>|<product>|<version>|<signature>|<name>|<severity>|<extension>`

However, if you want to parse also the key value pairs of the extension it can get tricky. Even though it looks very much like logfmt, it's not really the case. However, you can still use regular line filters, as in the example above.

james-callahan commented 2 years ago

However, if you want to parse also the key value pairs of the extension it can get tricky. Even though it looks very much like logfmt, it's not really the case. However, you can still use regular line filters, as in the example above.

At least with the logs coming from Cortex XDR, all the interesting things to alert on are in the extension section. We'd like to extract them to add to alerts generated from loki ruler.

chaudum commented 2 years ago

However, if you want to parse also the key value pairs of the extension it can get tricky. Even though it looks very much like logfmt, it's not really the case. However, you can still use regular line filters, as in the example above.

At least with the logs coming from Cortex XDR, all the interesting things to alert on are in the extension section. We'd like to extract them to add to alerts generated from loki ruler.

That's what I thought. I guess a regex guru may be able to write an expression for parsing the key value pairs, but it would be very inefficient. :(

chaudum commented 2 years ago

If you don't need to parse all key value pairs of the extension field, it may be relatively easy:

{filename="/var/log/cef.log"}
|= "cat=Discovery"
| pattern `CEF:<v>|<vendor>|<product>|<version>|<signature>|<name>|<severity>|<extension>`
| label_format original=`CEF:{{.v}}|{{.vendor}}|{{.product}}|{{.version}}|{{.signature}}|{{.name}}|{{.severity}}|{{.extension}}`
| line_format `{{.extension}}` 
| regexp `end=(?P<end>[^\s]+)` 
| end > 1600000000000
| line_format `{{.original}}`

And since you want to alert on certain things, I assume you don't even need the last line_format stage, because you're doing a metrics query:

sum by (shost) (
  count_over_time(
    {filename="/var/log/cef.log"}
    |= "cat=Discovery"
    | pattern `CEF:<v>|<vendor>|<product>|<version>|<signature>|<name>|<severity>|<extension>`
    | line_format `{{.extension}}` 
    | regexp `end=(?P<end>[^\s]+)` 
    | regexp `shost=(?P<shost>[^\s]+)` 
    | end > 1600000000000
    [$__interval]
  )
)

Do not blame me on performance ;-)

james-callahan commented 2 years ago

Even that doesn't help with the un-escaping process.

Though I think it does show why a dedicated cef parser would be useful :)

chaudum commented 2 years ago

Though I think it does show why a dedicated cef parser would be useful :)

Agree, a dedicated parser would be more useful. If we wanna do this, we'll have some thoughts on whether integrating a parser in LogQL or in Promtail, so it can be transformed into a different format.

stale[bot] commented 2 years ago

Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot.

james-callahan commented 2 years ago

we'll have some thoughts on whether integrating a parser in LogQL or in Promtail, so it can be transformed into a different format.

@chaudum any further thoughts on this?

lg-d commented 1 year ago

Is there any progress in making a CEF parser?

fourstepper commented 8 months ago

I tried searching around the internet for some CEF specification, but I soft-failed... There are some vendors that kind of explain the formatting, but some actual specification/RFC would help a lot