magnusbaeck / logstash-filter-verifier

Apache License 2.0
195 stars 27 forks source link

metadata fields #5

Open youngmip opened 8 years ago

youngmip commented 8 years ago

Hi,

I have a quick question! Can I test metadata fields as well? In our use case metadata fields are used to determine an index name.

Many thanks! Youngmi.

magnusbaeck commented 8 years ago

I'm afraid this isn't possible at the moment since Logstash's json_lines codec can't be configured to include the metadata fields in the serialized JSON string. I've filed https://github.com/logstash-plugins/logstash-codec-json_lines/issues/23 to get this fixed and was nearly done with the pull request when I hit a bump in the road (https://github.com/elastic/logstash/issues/5130) that I need to sort out first.

breml commented 7 years ago

Today I asked my self how hard it would be to convert the rubydebug output back to valid json. On a first try, it looks like this would not be that hard. @magnusbaeck please have a look at https://github.com/breml/rubydebug One remaining problem would be to find the end of an event. With the json_lines codec this is very easy, as every event is exactly one line. This would not be the case for the rubydebug codec. On the other hand, due to the "nice" formatting, every event ends with a line only containing a }.

breml commented 7 years ago

@magnusbaeck What are your thoughts about this issue? Do you intend to rebase https://github.com/logstash-plugins/logstash-codec-json_lines/pull/24 or do you think we should implement something into LFV to process the rubydebug output?

magnusbaeck commented 7 years ago

@breml I worked on the rebase tonight but it fell through since the specs are currently broken on master (see https://travis-ci.org/logstash-plugins/logstash-codec-json_lines/builds/226973335). Once that has been addressed it shouldn't take long to get the PR into shape. I'd much prefer that route over writing (and maintaining) a rubydebug parser.

breml commented 5 years ago

@magnusbaeck Today I stumbled again over this issue. We store the decision, into which Elasticsearch index a particular event should be stored in a field in the @metadata array. Now in the test cases we are not able to test, if this decision is made correctly, because the @metadata is not available. Two years ago we hoped, that a change in the logstash-codec-json_lines would be merged. Until now, this is not the case. What do you think about this?

magnusbaeck commented 5 years ago

Argh, this old problem. I'd forgotten that the ball was in my court. Let me have the weekend to look into a rebase.

magnusbaeck commented 5 years ago

The PR has been updated but there are existing problems that prevents the Travis build from being green so I'm guessing nobody's going to be jumping onto the review. I've asked about the failure in the Slack channel.

breml commented 5 years ago

@magnusbaeck Ping me, if there is something I can help with.

breml commented 5 years ago

@magnusbaeck Today I tested a different approach to workaround this problem and the solution is surprisingly easy. Since February 2017 (https://github.com/logstash-plugins/logstash-filter-mutate/commit/a1137c9fc4ca41069235b4f6e2f752563c39bb01#diff-091009b2789afad0d76a6df256ba6ff3) the mutate filter provides support for copying fields. So today I tried the following filter in my Logstash config:

filter {
  mutate {
    copy => {
      "[@metadata]" => "[__metadata]"
    }
  }
}

This worked just fine, all the nested fields within [@metadata] are copied to [__metadata], which then is provided by the json_lines codec.

As a convenience function logstash-filter-verifier could provide a flag to add this filter at the end of the filter config under test, if someone likes to get the @metadata fields. With this the user does not need to modify his filter config prior to the testing with logstash-filter-verifier.

What do you think?

breml commented 5 years ago

We then could also remove the special field __lfv_testcase which gets added by logstash-filter-verifier (I think this only the case if used with sockets).

magnusbaeck commented 5 years ago

That's an excellent workaround. Even if the PR we've been waiting for gets merged it'll take a while before it ends up in the Logstash releases people use (unless they specifically upgrade that plugin) so this workaround is better from a compatibility point of view. When doing this we should rename __metadata back to @metadata before passing the map to the chosen diff tool.

breml commented 5 years ago

Glad you like the workaround. For now, I have extended the above mentioned snippet to also remove the __lfv prefixed data from __metadata:

filter {
  mutate {
    copy => {
      "[@metadata]" => "[__metadata]"
    }
  }
  ruby {
    code => 'metadata = event.get("__metadata")
             metadata.delete_if {|key, value| key.start_with?("__lfv") }
             event.set("__metadata", metadata)'
  }
}

As long as only __lfv_testdata is used, this field could also be removed with a simple mutate filter.