logstash-plugins / logstash-filter-kv

Apache License 2.0
17 stars 42 forks source link

New regex functionality problem #57

Closed Mojster closed 6 years ago

Mojster commented 6 years ago

As recommended by @colinsurprenant this in a new issue. This problems appears with the new regex functionality #55.

I'm using LS 6.2.3. And here are my steps to reproduce.

My kv filter:

    kv {
        field_split_pattern => "\|"
        include_brackets => false
        value_split_pattern => "="
    }

Input:

1.2.2018 6:54:17    |C3|date=01.02.2018 06:54:17|acronym=ACRONYM|user=user|type=11|rptPackageStatus=0|transactionHostDepartment=01|membIdentificNumb=0000000|patronId=111|inventoryNo=000019088|cobissId=0000000|note=f|patronCategory=002|lastVisitDate=30.01.2018|schoolType=0|schoolName=00000|schoolDept=4.c|libraryCode=00000|libraryDept=|firstsignUpDate=16.01.2015|patronOccupation=|readingRoom=|bibl001c=m|biblUDK675s=61|biblLanguage101a=slv|biblType001b=a|biblTargetAudienceCode100e=a|parentDepartment=Sth|holdStatus=c|materialType=01|loanDate=01.02.2018|returnDate=15.02.2018|visitValid=0|visitTypeValid=0|

Output to console:

{
              "parentDepartment" => "Sth",
                "patronCategory" => "002",
                    "schoolName" => "00000",
                           "cir" => "C3",
                   "libraryDept" => "|firstsignUpDate=16.01.2015",
              "biblLanguage101a" => "slv",
                          "date" => 2018-02-01T05:54:17.000Z,
                    "holdStatus" => "c",
                   "inventoryNo" => "000019088",
             "membIdentificNumb" => "0000000",
                          "beat" => {
        "hostname" => "C3RAZVOJ",
         "version" => "6.2.3",
            "name" => "C3RAZVOJ"
    },
                    "prospector" => {
        "type" => "log"
    },
              "rptPackageStatus" => 0,
                    "schoolDept" => "4.c",
                          "note" => "f",
                   "libraryCode" => "00000",
              "patronOccupation" => "|readingRoom=",
                      "bibl001c" => "m",
                        "offset" => 2441,
                   "biblUDK675s" => "61",
                    "visitValid" => "0",
                      "@version" => "1",
                  "materialType" => "01",
                          "tags" => [
        [0] "beats_input_codec_plain_applied"
    ],
                          "type" => 11,
                "visitTypeValid" => "0",
    "biblTargetAudienceCode100e" => "a",
                 "lastVisitDate" => 2018-01-29T23:00:00.000Z,
                    "returnDate" => 2018-02-14T23:00:00.000Z,
                       "country" => "si",
                          "host" => "C3RAZVOJ",
                      "loanDate" => 2018-01-31T23:00:00.000Z,
     "transactionHostDepartment" => "01",
                          "user" => "user",
                      "patronId" => "111",
                       "acronym" => "ACRONYM",
                       "message" => "1.2.2018 6:54:17\t|cir=C3|date=01.02.2018 06:54:17|acronym=ACRONYM|user=user|type=1
1|rptPackageStatus=0|transactionHostDepartment=01|membIdentificNumb=0000000|patronId=111|inventoryNo=000019088|cobissId=
0000000|note=f|patronCategory=002|lastVisitDate=30.01.2018|schoolType=0|schoolName=00000|schoolDept=4.c|libraryCode=0000
0|libraryDept=|firstsignUpDate=16.01.2015|patronOccupation=|readingRoom=|bibl001c=m|biblUDK675s=61|biblLanguage101a=slv|
biblType001b=a|biblTargetAudienceCode100e=a|parentDepartment=Sth|holdStatus=c|materialType=01|loanDate=01.02.2018|return
Date=15.02.2018|visitValid=0|visitTypeValid=0|",
                      "cobissId" => "0000000",
                    "@timestamp" => 2018-03-21T13:43:35.905Z,
                        "source" => "g:\\elasticStack\\data\\test.log",
                    "schoolType" => "0",
                  "biblType001b" => "a"
}

If I'm slitting by | how can I get | in values? The splitted parts should be splitted again for key value pairs.

colinsurprenant commented 6 years ago

we can see that the values that contains a | are the ones where there is no values assigned to the field, for example: |patronOccupation=|readingRoom=| is parsed as "patronOccupation" => "|readingRoom="

Mojster commented 6 years ago

Another test with only field_split.

    kv {
#       field_split_pattern => "\|"
        field_split => "|"
        include_brackets => false
#       value_split_pattern => "="

And output is equal:

{
                       "message" => "1.2.2018 6:54:17\t|cir=C3|date=01.02.2018 06:54:17|acronym=ACRONYM|user=user|type=1
1|rptPackageStatus=0|transactionHostDepartment=01|membIdentificNumb=0000000|patronId=111|inventoryNo=000019088|cobissId=
0000000|note=f|patronCategory=002|lastVisitDate=30.01.2018|schoolType=0|schoolName=00000|schoolDept=4.c|libraryCode=0000
0|libraryDept=|firstsignUpDate=16.01.2015|patronOccupation=|readingRoom=|bibl001c=m|biblUDK675s=61|biblLanguage101a=slv|
biblType001b=a|biblTargetAudienceCode100e=a|parentDepartment=Sth|holdStatus=c|materialType=01|loanDate=01.02.2018|return
Date=15.02.2018|visitValid=0|visitTypeValid=0|",
                        "source" => "g:\\elasticStack\\data\\test.log",
                   "libraryDept" => "|firstsignUpDate=16.01.2015",
                  "biblType001b" => "a",
                    "@timestamp" => 2018-03-23T08:47:51.043Z,
    "biblTargetAudienceCode100e" => "a",
                          "beat" => {
            "name" => "C3RAZVOJ",
         "version" => "6.2.3",
        "hostname" => "C3RAZVOJ"
    },
             "membIdentificNumb" => "0000000",
                 "lastVisitDate" => 2018-01-29T23:00:00.000Z,
                  "materialType" => "01",
                    "visitValid" => "0",
                          "user" => "user",
                    "schoolType" => "0",
                          "host" => "C3RAZVOJ",
                    "returnDate" => 2018-02-14T23:00:00.000Z,
                      "loanDate" => 2018-01-31T23:00:00.000Z,
                    "schoolName" => "00000",
                      "bibl001c" => "m",
                   "inventoryNo" => "000019088",
     "transactionHostDepartment" => "01",
                    "prospector" => {
        "type" => "log"
    },
                       "country" => "si",
                    "holdStatus" => "c",
                          "tags" => [
        [0] "beats_input_codec_plain_applied"
    ],
                       "acronym" => "ACRONYM",
              "rptPackageStatus" => 0,
                      "patronId" => "111",
                        "offset" => 3653,
                          "date" => 2018-02-01T05:54:17.000Z,
                "visitTypeValid" => "0",
                      "@version" => "1",
              "parentDepartment" => "Sth",
              "patronOccupation" => "|readingRoom=",
                      "cobissId" => "0000000",
                   "libraryCode" => "00000",
                    "schoolDept" => "4.c",
                   "biblUDK675s" => "61",
              "biblLanguage101a" => "slv",
                          "type" => 11,
                           "cir" => "C3",
                "patronCategory" => "002",
                          "note" => "f"
}

I've wrote my own ruby splitter, but I would prefer using kv. Hope you'll be able to make my example work.

colinsurprenant commented 6 years ago

I am able to reproduce using

echo "a=1|b=|c=3" | bin/logstash -e 'input{stdin{}} filter{kv{field_split => "|" value_split => "="}} output{stdout{codec => rubydebug}}'
...
{
       "message" => "a=1|b=|c=3",
             "a" => "1",
          "host" => "mbp15r",
    "@timestamp" => 2018-03-23T14:12:39.108Z,
             "b" => "|c=3",
      "@version" => "1"
}

and same with using kv{field_split_pattern => "\|" value_split_pattern => "="}

colinsurprenant commented 6 years ago

PR #58 for easy fix.

colinsurprenant commented 6 years ago

Thanks @Mojster for reporting this! It is now fixed and released in version 4.1.1.