Use integration metadata to interact with Elasticsearch.
What does this PR do?
Change the creation of actions that are passed down to Elasticsearch to use also the metadata fields set by an integration.
The interested fields are id (document_id), index and pipeline, the field values are taken verbatim without placeholders resolution.
The index, document_id and pipeline that are configured in the plugin settings have precedence on the integration ones because manifest an explicit choice made by the user.
Why is it important/What is the impact to the user?
This PR fixes an interoperability issue with Agent's integrations, where some metadata valued by integration has to be used down to Elasticsearch.
Checklist
[x] My code follows the style guidelines of this project
[x] I have commented my code, particularly in hard-to-understand areas
[ ] I have made corresponding changes to the documentation
[ ] I have made corresponding change to the default configuration files (and/or docker env variables)
[x] I have added tests that prove my fix is effective or that my feature works
Author's Checklist
[x] use the plugin with an integration that configures the document id, and in case of same id no new documents are indexed.
How to test this PR locally
create a new deployment in Elastic cloud, it's the easiest way to have Elasticsearch, Kibana, and Fleet. Take note of the credentials when create the deployment, because later must be used in the configuration of logstash-outpiut-elasticsearch.
install an ElasticAgent and enroll in Fleet. It's only used to create a policy to install an integration (m365_defender), so that all the necessary pipelines are installed in Elastisearch.
in Logstash configure a pipeline like the following:
- now use a sample data event (like [this](https://docs.elastic.co/integrations/m365_defender#incident)) and create a one-line json file (named `/tmp/defender_singleline.json`). To squash all lines in one use:
```sh
cat <file_in>.json | awk '{for(i=1;i<=NF;i++) printf "%s",$i}' > <file_out>.json
install this plugin, setting the path to this branch into Gemfile.
run Logstash with
bin/logstash -f "/path_to/pipeline.conf"
stop, rm the sincedb file with rm /tmp/defender_sincedb
start again Logstash with same command line.
verify in a datastream index related to defender, something like .ds-logs-m365_defender.incident-ep-, that only one document is present.
This means that despite the 2 distinct runs, the Defender integration that generate an unique id from the Incident fields was correctly executed and used.
The proof can be done by executing the same flow above, with shipped ES output plugin, and verify that the document result duplicated, so no unique document_id is generated by the integration.
Release notes
Use integration metadata to interact with Elasticsearch.
What does this PR do?
Change the creation of actions that are passed down to Elasticsearch to use also the metadata fields set by an integration. The interested fields are
id
(document_id),index
andpipeline
, the field values are taken verbatim without placeholders resolution. The index, document_id and pipeline that are configured in the plugin settings have precedence on the integration ones because manifest an explicit choice made by the user.Why is it important/What is the impact to the user?
This PR fixes an interoperability issue with Agent's integrations, where some metadata valued by integration has to be used down to Elasticsearch.
Checklist
[ ] I have made corresponding changes to the documentation[ ] I have made corresponding change to the default configuration files (and/or docker env variables)Author's Checklist
How to test this PR locally
logstash-outpiut-elasticsearch
.filter { elastic_integration { cloud_id => ""
cloud_auth => "elastic:"
geoip_database_directory => "//vendor/bundle/jruby/3.1.0/gems/logstash-filter-geoip-7.2.13-java/vendor/GeoLite2-City.mmdb"
}
}
output { stdout { codec => rubydebug { metadata => true } }
elasticsearch { cloud_id => ""
api_key => ""
data_stream => true
ssl => true
}
}
or use the file defender_singleline.json
Gemfile
.rm /tmp/defender_sincedb
.ds-logs-m365_defender.incident-ep-
, that only one document is present.This means that despite the 2 distinct runs, the Defender integration that generate an unique id from the Incident fields was correctly executed and used. The proof can be done by executing the same flow above, with shipped ES output plugin, and verify that the document result duplicated, so no unique document_id is generated by the integration.
Related issues
Use cases
Screenshots
Logs