magnusbaeck / logstash-filter-verifier

Apache License 2.0
191 stars 27 forks source link

Backslash in input fields are duplicated before being sent to logstash #118

Open mutt13y opened 3 years ago

mutt13y commented 3 years ago

Description

Backslashes are not parsed properly. I believe that LFV is adding an extra quote level and causing any backslash to be doubled up

Motivation

Testing grok patterns for windows paths on logstash 7.12

Exemplification

Using version 1.6.3

logstash config

input {
    beats {
        port => 5044
        host => "0.0.0.0"
    }
}
filter {
    if [log][file][path] {
        grok {
           pattern_definitions => {
                "FILE" => "[^/]*"
                "WINFILE" => "[^\\]*"
            }
            match => {
                "[log][file][path]" => [
                    "^%{GREEDYDATA:[log][file][dir]}/%{FILE:[log][file][name]}",
                    "^%{GREEDYDATA:[log][file][dir]}\\%{WINFILE:[log][file][name]}"
                ]
            }
            tag_on_failure => ["_path_parse_failure"]
        }
    }
}

I have this deployed and it is correctly parsing linux and windows paths

test case

---
fields:
  log:
    file:
      path: 'C:\logs\current\foo.log'
ignore:
  - "host"
  - "fields"
  - "@timestamp"
testcases:
  - input:
      - "2020-09-23 07:20:00.000000 | INFO | TEST"
    expected:
      - "log":
          "file":
            "path": 'C:\logs\current\foo.log'
            "dir": 'C:\logs\current'
            "file": 'foo.log'
        "message": "2020-09-23 07:20:00.000000 | INFO | TEST"
...

results

{
   "log": {
     "file": {
-      "dir": "C:\\logs\\current",
-      "file": "foo.log",
-      "path": "C:\\logs\\current\\foo.log"
+      "dir": "C:\\\\logs\\\\current\\",
+      "name": "foo.log",
+      "path": "C:\\\\logs\\\\current\\\\foo.log"
     }
   },

conclusion

So you can see that even log.file.path which is included in yaml single quotes both in fields and in expected is some how getting an extra \ added (I accept that the display output is also doubling up on \ ).

Look at log.dir there is a \ at the end. If you think about the grok pattern it is clear that this backslash is being added at the input, because the grok pattern removes one backslash.

Running with 7.12 this grok pattern works with windows filebeats as the source.