logstash-plugins / logstash-filter-translate

Translate filter for Logstash
Apache License 2.0
21 stars 47 forks source link

Enhance multi-field lookup enrichment #44

Open acchen97 opened 7 years ago

acchen97 commented 7 years ago

Translate supports JSON, CSV, and YAML file lookups. Each of these formats support some type of multi-field lookup; for JSON and YAML its heirarchical, and in CSV, a lookup on a key could reference multiple values in the row.

Currently, these lookups are possible, but will result in a complex object in the "destination" or self defined field. We should allow for these multi-field lookups to just add new top level fields for enriching the event.

jordansissel commented 7 years ago

As a workaround, or perhaps solution, you can achieve, today, what you describe by using multiple translate filters.

On Fri, Mar 17, 2017 at 9:42 PM Alvin Chen notifications@github.com wrote:

Translate supports JSON, CSV, and YAML file lookups. Each of these formats support some type of multi-field lookup; for JSON and YAML its heirarchical, and in CSV, a lookup on a key could reference multiple values in the row.

Currently, these lookups are possible, but will result in a complex object in the "destination" or self defined field. We should allow for these multi-field lookups to just add new top level fields for enriching the event.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/logstash-plugins/logstash-filter-translate/issues/44, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIC6hcJInFRO2c7fK_KeRT9nSbvaSMxks5rm2CqgaJpZM4MhSuZ .

shreyasrk commented 7 years ago

I agree. Basically, what I need is this -

{ translate { dictionary_path => '/some/field/path/to/lookup/as/reference (JSON|YAML|CSV)' fields => ['event_field_1', event_field_2] destination => ['new_event_field_1_replaced', 'new_event_field_2_replaced'] } }

The objective is to use the same reference file to replace multiple fields with values.

If I use multiple translate filters it will re-load the same file multiple times(?) Kindly confirm.

Chandanvatsa commented 6 years ago

+1

jordansissel commented 6 years ago

If I use multiple translate filters it will re-load the same file multiple times(?) Kindly confirm.

Yes. What is your concern with this?

coregear commented 6 years ago

+1 I need this.

alesnav commented 6 years ago

This would be a nice new feature for data enrichment!

For example, for username data enrichment using a CSV/JSON file, you would be able to add full name, department, office, etc, at the same time with just one call to translate filter.

guyboertje commented 6 years ago

It seems like the requested feature links multiple source fields to destination fields. It would be tricky to validate 1 to 1 mapping of field array elements to destination array elements. We could consider a new setting mapping (hash).

  translate {
    mapping {
      [f1] => [d1]
      [f2] => [d2]
    }
    ...
  }

This, however, would mean that the dictionary holds keys and values from multiple domains. I would argue that separate translate filters per domain is a cleaner approach.

On the other hand I can see scenarios where an event has several field values in the same domain, e.g. src_ip/dest_ip or from_id/to_id.

guyboertje commented 6 years ago

As regards the original proposal of having multi-valued translations added to the root of an event, the problem lies with the fallback setting. It is a string.

The question is how to accommodate a multi-field lookup value with a string fallback. Should there be a no match fallback substitution then there will be an ES mapping conflict.

My advice would be to use a CSV dictionary followed by a Dissect filter. The lookup value and fallback value should have the same structure then one can apply the Dissect filter regardless of match or no match.

guyboertje commented 6 years ago

I have created a PR #67 that adds support for iterate_on, a new setting that handles fields with an array of values (strings).

With this one can achieve multiple field translations. First build an field with array values , say, ips by using add_field => { "[ips][0]" => "%{src_ip}" "[ips][1]" => "%{dest_ip}" } then iterate_on ips, you will have a translated array. Then add_field again. add_field => { "[src_name]" => "%{[translated][0]}" "[dest_name]" => "%{[translated][1]}" }