influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.47k stars 5.55k forks source link

[inputs.tail] Grok regex patterns not matching #15315

Closed bloodmc closed 4 months ago

bloodmc commented 4 months ago

Relevant telegraf.conf

[[inputs.tail]]
  files = ["\\\\servername\\logs\\powershell\\**\\*.txt"]
  from_beginning = true
  watch_method = "poll"
  data_format = "grok"
  grok_custom_patterns = '''
    BASE64_ENCODED (?i)[A-Za-z0-9+/]{44,}(?:[A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)
    SUSPICIOUS_COMMANDS (?i)reflection|socket|download|internetexplorer.application|xmlhttp|assemblybuilder|gzipstream|decompress|io.compression|write-zip|(expand|compress)-archive|-bxor|security.cryptography|getdelegateforfunctionpointer
    POWERSHELL_USAGE (?i)powershell -version|invoke-command|invoke-expression|start-process|set-executionpolicy
    NETWORK_ACTIVITY (?i)socket|webclient|wget|curl|net.webclient|downloadstring|downloadfile|uploadfile
    ENCODING_METHODS (?i)frombase64string|base64|utf8|unicode|encode|decode|compress|expand
    MALICIOUS_TOOLS (?i)mimikatz|nishang|metasploit|shellcode|exploit|amsibypass
    SUSPICIOUS_BEHAVIOR (?i)disable-realtime|bypass|enable-psremoting|brute.*force|port.*scan|reverse.*shell|credential.*dump
  '''

  grok_patterns = [
    "%{BASE64_ENCODED}",
    "%{SUSPICIOUS_COMMANDS}",
    "%{POWERSHELL_USAGE}",
    "%{NETWORK_ACTIVITY}",
    "%{ENCODING_METHODS}",
    "%{MALICIOUS_TOOLS}",
    "%{SUSPICIOUS_BEHAVIOR}"
  ]

Logs from Telegraf

2024-05-07T14:03:53Z D! [parsers.grok::tail] Grok no match found for: "$asm = [System.Reflection.Assembly]::Load($data)"

System info

Telegraf 1.30.1, Windows Server 2022 Standard and pushing logs to loki

Docker

Manual setup. (No docker used)

Steps to reproduce

  1. Run telegraf using tail config above
  2. Grok patterns fail to match showing similar logs above

Expected behavior

For logs to be matched correctly. In the example above, I would expect SUSPICIOUS_COMMANDS pattern to match the log line but it does not.

Actual behavior

Powershell logs do not get matched. I tested with https://grokdebugger.com/ and everything is matched correctly.

Additional info

No response

srebhan commented 4 months ago

@bloodmc the issue here is that your pattern will not create a field. I agree that the error message is misleading (will put up a PR) but you can fix the issue by providing a field-name in the patterns:

  grok_patterns = [
    "%{BASE64_ENCODED:value}",
    "%{SUSPICIOUS_COMMANDS:value}",
    "%{POWERSHELL_USAGE:value}",
    "%{NETWORK_ACTIVITY:value}",
    "%{ENCODING_METHODS:value}",
    "%{MALICIOUS_TOOLS:value}",
    "%{SUSPICIOUS_BEHAVIOR:value}"
  ]
bloodmc commented 4 months ago

@bloodmc the issue here is that your pattern will not create a field. I agree that the error message is misleading (will put up a PR) but you can fix the issue by providing a field-name in the patterns:

  grok_patterns = [
    "%{BASE64_ENCODED:value}",
    "%{SUSPICIOUS_COMMANDS:value}",
    "%{POWERSHELL_USAGE:value}",
    "%{NETWORK_ACTIVITY:value}",
    "%{ENCODING_METHODS:value}",
    "%{MALICIOUS_TOOLS:value}",
    "%{SUSPICIOUS_BEHAVIOR:value}"
  ]

Ah yes the message was definitely misleading. Your change worked. Thanks!

srebhan commented 4 months ago

@bloodmc what do you think about my message in PR #15318?

bloodmc commented 4 months ago

@bloodmc what do you think about my message in PR #15318?

Wouldn't it make more sense to log a message on startup if fields are missing? Could also default to value if nothing is found?

srebhan commented 4 months ago

@bloodmc I don't think you can find out if a regex has a named group, can you? Furthermore, using "value" as default doesn't work if you do have multiple groups. I would rather not try to be clever here to avoid breaking people... :-)

bloodmc commented 4 months ago

Fair enough, log is good then.