Open acchen97 opened 7 years ago
As a workaround, or perhaps solution, you can achieve, today, what you describe by using multiple translate filters.
On Fri, Mar 17, 2017 at 9:42 PM Alvin Chen notifications@github.com wrote:
Translate supports JSON, CSV, and YAML file lookups. Each of these formats support some type of multi-field lookup; for JSON and YAML its heirarchical, and in CSV, a lookup on a key could reference multiple values in the row.
Currently, these lookups are possible, but will result in a complex object in the "destination" or self defined field. We should allow for these multi-field lookups to just add new top level fields for enriching the event.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/logstash-plugins/logstash-filter-translate/issues/44, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIC6hcJInFRO2c7fK_KeRT9nSbvaSMxks5rm2CqgaJpZM4MhSuZ .
I agree. Basically, what I need is this -
{ translate { dictionary_path => '/some/field/path/to/lookup/as/reference (JSON|YAML|CSV)' fields => ['event_field_1', event_field_2] destination => ['new_event_field_1_replaced', 'new_event_field_2_replaced'] } }
The objective is to use the same reference file to replace multiple fields with values.
If I use multiple translate filters it will re-load the same file multiple times(?) Kindly confirm.
+1
If I use multiple translate filters it will re-load the same file multiple times(?) Kindly confirm.
Yes. What is your concern with this?
+1 I need this.
This would be a nice new feature for data enrichment!
For example, for username data enrichment using a CSV/JSON file, you would be able to add full name, department, office, etc, at the same time with just one call to translate filter.
It seems like the requested feature links multiple source
fields to destination
fields.
It would be tricky to validate 1 to 1 mapping of field
array elements to destination
array elements.
We could consider a new setting mapping
(hash).
translate {
mapping {
[f1] => [d1]
[f2] => [d2]
}
...
}
This, however, would mean that the dictionary holds keys and values from multiple domains. I would argue that separate translate filters per domain is a cleaner approach.
On the other hand I can see scenarios where an event has several field values in the same domain, e.g. src_ip/dest_ip or from_id/to_id.
As regards the original proposal of having multi-valued translations added to the root of an event, the problem lies with the fallback
setting. It is a string.
The question is how to accommodate a multi-field lookup value with a string fallback
.
Should there be a no match fallback substitution then there will be an ES mapping conflict.
My advice would be to use a CSV dictionary followed by a Dissect filter. The lookup value and fallback value should have the same structure then one can apply the Dissect filter regardless of match or no match.
I have created a PR #67 that adds support for iterate_on
, a new setting that handles fields with an array of values (strings).
With this one can achieve multiple field translations. First build an field with array values , say, ips
by using add_field => { "[ips][0]" => "%{src_ip}" "[ips][1]" => "%{dest_ip}" }
then iterate_on ips
, you will have a translated array. Then add_field again. add_field => { "[src_name]" => "%{[translated][0]}" "[dest_name]" => "%{[translated][1]}" }
Translate supports JSON, CSV, and YAML file lookups. Each of these formats support some type of multi-field lookup; for JSON and YAML its heirarchical, and in CSV, a lookup on a key could reference multiple values in the row.
Currently, these lookups are possible, but will result in a complex object in the "destination" or self defined field. We should allow for these multi-field lookups to just add new top level fields for enriching the event.