logstash-plugins / logstash-filter-grok

Grok plugin to parse unstructured (log) data into something structured.
https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
Apache License 2.0
124 stars 97 forks source link

grok should have a replace functionality like we have for mutate #103

Open saurabh8585 opened 7 years ago

saurabh8585 commented 7 years ago

Usecase

I have few custom regex patterns which looks for some sensitive information in the log messages like credit card number, social security number etc.

I have applied these patterns inside grok and matching each log message for regex's I wrote in a file inside patterns folder.

Log message which has a matching pattern would be added with a custom field named "Infosec_Pattern" with matching pattern values like "CCN, SSN" etc.

Logstash version 2.3.1

Below is the sample filter config

filter
{
  grok {
    patterns_dir => ["/logstash/patterns"]
    match => { "message" => "%{CCN}" }
    add_field => { "Infosec_Pattern" => "CCN" }
  }
  grok {
    patterns_dir => ["/logstash/patterns"]
    match => { "message" => "%{SSN}" }
    add_field => { "Infosec_Pattern" => "SSN" }
  }
}

This works perfect. Now what I want is:

Replace a matched string with some value like "XXXXXXXX" in the message since the matching string contains sensitive information.

In order to do this, I need to make use of mutate where I have to again find the pattern in log message and replace it with desired value using gsub.

Below is the sample filter config (with mutate section)

filter
{
  ... 
  ... ## Some groks (See above filter config for example)
  ... 
  mutate {
    remove_field => "tags"
    gsub => [
      "message","[0-9]{16}","XXXXXXXXXXXXX"    #### The regex pattern supposedly matches credit card no which has 16 digit
    ]
  }
}

Output after applying above sample config

Parsed log message without having mutate section looks like below:

{
            "message" => "Saurabh ccn is 5123456789012345",
           "@version" => "1",
         "@timestamp" => "2016-12-07T12:01:09.554Z",
               "host" => "d7231b98ec06",
    "Infosec_Pattern" => "CCN"
}

Parsed log message having mutate section looks like below:

{
            "message" => "Saurabh ccn is XXXXXXXXXXXXX",
           "@version" => "1",
         "@timestamp" => "2016-12-07T11:57:50.075Z",
               "host" => "d7231b98ec06",
    "Infosec_Pattern" => "CCN"
}

As we can clearly see, we need to match a pattern twice if I want to replace the matched string in the original message field.

I tried to use overwrite inside grok but that is not helping much as sensitive data can be present anywhere in the string. And also I would not be able to replace the data with some desired value like "XXXX" using overwrite.

Expectation

  1. Add a functionality in grok itself to replace matched string with some desired value. OR
  2. Add a functionality in mutate to include the custom regex pattern like we do in grok.

Option 1 seems to be a best fit for this.

jordansissel commented 7 years ago

Grok is primarily for parsing, not modifying data. The mutate filter (since it does text replacement already), or a new filter, feels like a better place to implement this proposal.

jordansissel commented 7 years ago

Otherwise, I am in favor of this feature.

saurabh8585 commented 7 years ago

Thanks @jordansissel for supporting this issue.

Since this issue interests you, I have 1 more point to make it more interesting.

Currently, we do write 1 custom regex pattern on each line like below.

../my_pattern_directory/my_pattern_file

CCN_MASTER [1-2]{16}
CCN_VISA [2-3]{15}
CCN_AMEX [3-4]{14}
CCN_MAESTRO [4-5]{13}

Inorder to apply above patterns on a log message, we need to write filter like something as shown below

filter
{
  grok {
    patterns_dir => ["/logstash/patterns"]
    match => { "message" => "%{CCN_MASTER}" }
    add_field => { "Infosec_Pattern_Found" => "CCN" }
  }
  grok {
    patterns_dir => ["/logstash/patterns"]
    match => { "message" => "%{CCN_VISA}" }
    add_field => { "Infosec_Pattern_Found" => "CCN" }
  }
}

As we can see, the grok count will increases as we have more no of patterns. Also, the "Infosec_Pattern_Found" field getting added redundantly here.

Proposed solution

Instead of identifying custom patterns individually, we can group them like below.

../my_pattern_directory/my_pattern_file

CCN
{
  MASTER [1-2]{16}
  VISA [2-3]{15}
  AMEX [3-4]{14}
  MAESTRO [4-5]{13}
}

And the corresponding filter looks something like below.

filter
{
  grok {
    patterns_dir => ["/logstash/patterns"]
    match => { "message" => "%{CCN}" }
    add_field => { "Infosec_Pattern_Found" => "CCN" }
  }
}

OR

filter
{
  grok {
    patterns_dir => ["/logstash/patterns"]
    match => { "message" => "%{CCN.MASTER}" }
    add_field => { "Infosec_Pattern_Found" => "CCN" }
  }
}

This way, we will achieve:

Please do consider this point as well if it seems feasible. Let me know if we can track this altogether in a different ticket.

jordansissel commented 7 years ago

You can do this today:

CCN_MASTER [1-2]{16}
CCN_VISA [2-3]{15}
CCN_AMEX [3-4]{14}
CCN_MAESTRO [4-5]{13}

# Create a pattern called CCN that matches any of the above:
CCN %{CCN_MASTER}|%{CCN_VISA}|%{CCN_AMEX}|%{CCN_MAESTRO}