vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.09k stars 1.48k forks source link

Add parser function for VMWare dfwpktlogs #16014

Open Anonymodesu opened 1 year ago

Anonymodesu commented 1 year ago

A note for the community

Use Cases

Hi all, can we add a parser function for VMWare rules message logs?

Examples and format description here: https://docs.vmware.com/en/VMware-NSX-Data-Center-for-vSphere/6.4/com.vmware.nsx.admin.doc/GUID-6F9DC53E-222D-464B-8613-AB2D517CE5E3.html

Note that in the examples given above, the timestamp at the beginning is part of the log header should be ignored (header varies depending on syslog format) i.e. given this example:

2015-03-10T03:20:31.274Z INET match DROP domain-c7/1002 IN 242 UDP 192.168.110.10/138->192.168.110.255/138

the prospective function only needs to parse INET match DROP domain-c7/1002 IN 242 UDP 192.168.110.10/138->192.168.110.255/138.

Attempted Solutions

Currently Im using parse_regex with the following regex:

        (?P<filterHash>\\w+) \
        (?P<afValue>(INET|INET6)) \
        (?P<reason>(\\w|-)+) \
        ((?P<action>[A-Z]+) )??\
        (?P<ruleSet>(\\w|-)+)/(?P<ruleID>\\d+) \
        (?P<direction>(IN|OUT)) \
        ((?P<packetLen>\\d+) )??\
        (?P<protocol>(PROTO \\d+|[A-Z]+)) \
        ((?P<protocolExtraAll>(\\d \\d|\\w+)) )?\
        (?P<ip1>((\\d{1,3}\\.){3}\\d{1,3})|(:{0,2}?[a-z0-9]{1,4}:{0,2}?)+)\
        (/(?P<port1>\\d+))?\
        ->\
        (?P<ip2>((\\d{1,3}\\.){3}\\d{1,3})|(:{0,2}?[a-z0-9]{1,4}:{0,2}?)+)\
        (/(?P<port2>\\d+))?\
        ( (?P<tcpFlags>[AEFPRSUW]+))?\
        ( (?P<packetsIn>\\d+)/(?P<packetsOut>\\d+) (?P<bytesIn>\\d+)/(?P<bytesOut>\\d+))?\
        ( (?P<ruleTag>.*))?

Proposal

No response

References

No response

Version

0.26.0

spencergilbert commented 1 year ago

Hey @Anonymodesu confirming the proper link to the docs is: https://docs.vmware.com/en/VMware-NSX-Data-Center-for-vSphere/6.4/com.vmware.nsx.admin.doc/GUID-6F9DC53E-222D-464B-8613-AB2D517CE5E3.html

It looks like vec got appended to the link which 404'd it.

header varies depending on syslog format

Does this mean the start doesn't always include a timestamp, or that the format of the timestamp changes depending?

Anonymodesu commented 1 year ago

Hey @Anonymodesu confirming the proper link to the docs is: https://docs.vmware.com/en/VMware-NSX-Data-Center-for-vSphere/6.4/com.vmware.nsx.admin.doc/GUID-6F9DC53E-222D-464B-8613-AB2D517CE5E3.html

It looks like vec got appended to the link which 404'd it.

Yes thanks for catching that. I've fixed the link.

header varies depending on syslog format

From the logs I've observed, the dfwpktlog message can be wrapped by either RFC 3164 or RFC 5424 headers (and the header may change for other VMWare stacks). My point being, the dfwpktlog parser shouldn't take responsibility for parsing the header, and delegate that to other parsers e.g. parse_syslog

Does this mean the start doesn't always include a timestamp, or that the format of the timestamp changes depending?

The timestamp at the beginning of this log (provided by the docs) is an example of what a header may look like.

2015-03-10T03:20:31.274Z INET match DROP domain-c7/1002 IN 242 UDP 192.168.110.10/138->192.168.110.255/138

The timestamp should be removed by another parser (like parse_syslog) as I explain above, and the dfwpktlog parser only needs to parse the payload INET match DROP domain-c7/1002 IN 242 UDP 192.168.110.10/138->192.168.110.255/138

spencergilbert commented 1 year ago

Thanks for the clarification!