logstash-plugins / logstash-output-elasticsearch

https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html
Apache License 2.0
219 stars 305 forks source link

Is it possible to support indexing of dynamic variables using rollover? #858

Closed zcola closed 2 years ago

zcola commented 5 years ago

output { elasticsearch { hosts => [ 'ccc.om:9200' ] ilm_enabled => true ilm_rolloveralias => "cbg%{product}_%{log}_loghub" } }

def setup_ilm
      return unless ilm_enabled?
      if default_index?(@index) || !default_rollover_alias?(@ilm_rollover_alias)
        logger.warn("Overwriting supplied index #{@index} with rollover alias #{@ilm_rollover_alias}") unless default_index?(@index)
        @index = @ilm_rollover_alias
        maybe_create_rollover_alias
        maybe_create_ilm_policy
      end
    end

Now it is time to determine if rolloverneeds to create rollover when it starts.

@robbavey

zcola commented 5 years ago

If ES bulk can determine that the target index is not a rollover index and returns an error to logstash , then should it be able to create it automatically?

ppf2 commented 5 years ago

This does not appear to work today.

Without ILM, Logstash variable substitution works as expected:

index => "mytest-%{[@metadata][index]}-%{+YYYY.MM.dd}" 

But the equivalent for ILM does not work. Both ilm_rollover_alias and ilm_pattern settings throw errors when attempting to use Logstash substitution syntax:

ilm_rollover_alias => "logstash-%{[@metadata][index]}"
ilm_pattern => "{now/d}-000001"
[2019-05-30T13:36:58,042][ERROR][logstash.outputs.elasticsearch] Failed to install template. {:message=>"Malformed escape pair at index 18: /_template/logstash-%{[@metadata][index]}", :class=>"Java::JavaNet::URISyntaxException", 
ilm_rollover_alias => "logstash" 
ilm_pattern => "%{[@metadata][index]}-{now/d}-000001"
LogStash::Outputs::ElasticSearch::HttpClient::Pool::BadResponseCodeError: Got response code '403' contacting Elasticsearch at URL 'https://node1:9200/%253C%2525%257B%255B%2540metadata%255D%255Bindex%255D%257D-%257Bnow%252Fd%257D-000001%253E'
                    perform_request at /Users/ppf2/Elastic/ElasticStack_6_0/6.7.0/logstash-6.7.0/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-9.4.0-java/lib/logstash/outputs/elasticsearch/http_client/manticore_adapter.rb:80
ppf2 commented 5 years ago

Until this is addressed, it will be helpful to set expectations in the documentation in these sections:

Battleroid commented 5 years ago

This is unfortunate; ~it's possible to use the substitutions on ilm_policy~ (I confused this with something else), but nothing else. This makes setting up multiple RO aliases a bit difficult.

Battleroid commented 5 years ago

I managed to cobble something terrible together that appears to work (see here). It's not great, but it is functional. I still have some things to tinker with but at least this works for now.

Sample run/config as an example here: https://gist.github.com/Battleroid/beec2b88c9b59fd3665defa08162c4fb

Battleroid commented 5 years ago

I've made some additional changes to this to allow it to do the creation of the rollover index with the specified pattern, but without setting the settings, etc. When I tried this initially without the modifications the cluster came to a complete stop after a few hours, filled with ilm-set-step-info tasks.

Letting logstash just create the aliases seems to be a good compromise. We are able to have something external do the process of rollover. We don't have to worry about creating the index patterns or about ILM rollover related tasks clogging things up.

jeffrysleddens commented 5 years ago

We also would love to use ILM within logstash but were hit by this issue. We would like to use dynamic naming in the ilm_rollover_alias setting like we used to in the index setting.

A short search learns that quite a few people run into this limitation with ilm in the elasticsearch output: https://discuss.elastic.co/t/es-output-using-ilm-settings-malformed-escape-pair-at-index/178117 https://discuss.elastic.co/t/use-of-metadata-in-ilm-configuration/182918

AndrewMcQuerry commented 5 years ago

Same here. We would love to leverage the ilm_* settings within logstash to help manage this but were stopped dead-in-the-water due to this limitation.

jeffrey-e commented 5 years ago

Same issue here, +1 for adding this feature

astraios commented 5 years ago

+1 here, being able to dynamically set ilm_rollover_alias/ilm_policy using the same syntax as the index field would be huge.

spencergilbert commented 5 years ago

I don't see how ilm without dynamic setting like index is even usable if you're not looking to hand curate all of your indices, or wanting to work outside of the cluster to provide the functionality

msvechla commented 5 years ago

I've opened an issue about this scenario when ILM initially was introduced. Having the ability to use variables inside these settings is essential for us to make dynamic rollover patterns work.

This feature would be a huge benefit for our large elasticsearch infrastructure.

EDIT: Here the request from 2018 about this feature: https://github.com/logstash-plugins/logstash-output-elasticsearch/pull/805#issuecomment-438385822

nicolas-123 commented 5 years ago

+1, also looking to set ilm_rollover_alias dynamically

jianchen2580 commented 5 years ago

+1

mrs83 commented 5 years ago

+1, I am also looking for this

vogon1 commented 5 years ago

+1, same here.

cedzz commented 5 years ago

+1, same here.

asega commented 5 years ago

+1

reighnman commented 5 years ago

Trying to use CPM compounds the issue as CPM currently doesn't support wildcards in xpack.management.pipeline.id so we need to use pipelines with aggregated inputs but cannot dynamically assign ilm policies without being able to use substitutions like we can with index names.

Once CPM can support wildcards for pipeline.id's then we could have a pipeline per output with a static ilm name.

msvechla commented 5 years ago

I wrote a Kubernetes controller to solve these current limitations and automates the rollover pattern. This will help if you want to forward Kubernetes Pod logs to Elasticsearch. If you are interested you can take a look here: https://gitlab.com/msvechla/es-rollover-controller

iwasnobody commented 5 years ago

+1

rpasche commented 5 years ago

+1

HitkoDev commented 5 years ago

Any news on this?

phobosale commented 5 years ago

+1

paulojmdias commented 5 years ago

+1

admlko commented 4 years ago

+1

epol commented 4 years ago

+1

jsvd commented 4 years ago

Hi folks, to those coming here to vote on the issue, please use the +1 reaction on the initial comment, as the "+1" comments only add noise to the issue.

The expectations when using ILM are the same when using indices, where if the destination doesn't exist, Elasticsearch automatically creates an index for you that respects the index template if the name matches the pattern.

For write aliases this isn't possible in Elasticsearch as the write alias creation is an explicit action. Some discussion has been happening in Elasticsearch to improve this user experience, please subscribe the following related issues:

tomrade commented 4 years ago

I use beats and LS , I imported in the beats index template and ILM, however with ILM off (in logstash) there isnt an easy way to create the index dynamic with an alias. I added it to my index mapping but that broke ILM as multiple indexes could have that alias, from this thread is my understanding that with logstash I (or logstash) needs to create the write index first at some point (pipeline startup?)

mccarthyp-snet commented 4 years ago

+1

nHurD commented 4 years ago

+1

jsvd commented 4 years ago

Quick update: a meta issue has been created in elasticsearch to track the work of building the concept of alias templates, which facilitates the support of dynamic parameters in this plugin's ILM setup. For those interested in this feature you can track the progress here: https://github.com/elastic/elasticsearch/issues/51995

drenze commented 4 years ago

+1

maggieghamry commented 4 years ago

+1

tarunpasrija commented 4 years ago

+1 I really hate to keep manual interactions while setting up indexes.. Need only 1 place which is logstash to push those template changes so that its more manageable. Example config.

input { kafka { bootstrap_servers => "{{bootstrap_servers}}" topics => ["mytopic1", "mytopic2"] auto_offset_reset => "earliest" client_id => "application-metrics-{{ansible_hostname}}" consumer_threads => 2 group_id => "application-metrics-{{envvar}}"

} }

output { elasticsearch { hosts => {{es_master_nodes}} user => {{logstash_writer_user}} password => {{logstash_writer_password}} ilm_rollover_alias => "application-metrics-%{topic}-{{envvar}}" template => "/etc/logstash/index/application-metrics.json" template_name => "application-metrics-{{envvar}}" ilm_pattern => "000001" ilm_policy => "default" manage_template => true } }

As the above example.. I need to insert the Topic Name in ilm_rollover_alias and I can have single configuration for multiple Kafka topics instead of creating a new pipeline for each kafka topic.

dunkelbunt1 commented 4 years ago

+1

Zoom2016 commented 4 years ago

+1 This feature will be very helpful to me

NanayaLL commented 4 years ago

+1

jugggao commented 4 years ago

+1 This feature will be very helpful to me ☹

iainmarshall commented 4 years ago

+1 This feature is something I am dying for please.

flaper87 commented 4 years ago

Another thumbs up over here: :+1:

obogobo commented 4 years ago

yes +1, we currently use a shell script to bulk create a ton of templates as a workaround lol

Jayw77 commented 4 years ago

+1, finally got all my indexes generating via labels per app how I wanted it to only realise I can't use the dynamically generated name/index with ILM :(

ppf2 commented 4 years ago

With version 7.9's new data streams implementation, we should be able to leverage this new feature to achieve dynamic variable substitution for index names with ILM+rollover.

I have submitted a doc issue with draft for proper documentation, pending review. This serves as the stop gap recipe for implementing data streams with Logstash until a new Elasticsearch Data Stream output plugin is available in the future.

msvechla commented 4 years ago

With version 7.9's new data streams implementation, we should be able to leverage this new feature to achieve dynamic variable substitution for index names with ILM+rollover. I have submitted a doc issue with draft for proper documentation, pending review.

Unfortunately nothing has changed here. This was possible before, by setting up the rollover pattern via index template, ILM policy, rollover index and write alias manually and finally pointing logstash at the write alias with variable substitution.

In your example everything is still setup manually. I thought this issue is about allowing variable substitution, when dynamically creating the required artifacts for rollover / ILM (e.g. ilm rollover alias, see https://github.com/logstash-plugins/logstash-output-elasticsearch/issues/858#issue-429121752)?

Data streams definitely make the bootstrapping of rollver and ILM a lot easier, however I think the original issue is still not solved, right?

tomrade commented 4 years ago

My thoughts were the user wants to use %{VAR} in the ILM alias (which wouldnt work ) , which they can do now with data streams as a data stream can be created dynamically (via the index name provided in the index request and a matching mapping template).

https://www.elastic.co/guide/en/elasticsearch/reference/7.x/set-up-a-data-stream.html#index-documents-to-create-a-data-stream

msvechla commented 4 years ago

My thoughts were the user wants to use %{VAR} in the ILM alias (which wouldnt work ) , which they can do now with data streams as a data stream can be created dynamically (via the index name provided in the index request and a matching mapping template).

https://www.elastic.co/guide/en/elasticsearch/reference/7.x/set-up-a-data-stream.html#index-documents-to-create-a-data-stream

Awesome, thanks for clarifying, I totally missed this functionality.

Indeed, we can then setup an index template including a data stream and ILM once, with a more generic index pattern such as my-streams-* and finally instruct logstash to write to my-streams-%{VAR}, which will create a new data stream for every unique value of %{VAR}.

That's a big improvement, thanks for the hint!

pujithkurunji commented 4 years ago

+1 here, being able to dynamically set ilm_rollover_alias/ilm_policy using the same syntax as the index field would be huge.

Any updates on this?

output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "%{[@metadata][log_index]}"
    ilm_rollover_alias => "%{[@metadata][log_index]}"
    ilm_pattern => "000001"
    ilm_policy => "custom_policy"
  }

This is what I'm expecting. I can use a dynamic index, by getting log_index from Filebeat. I want to use the same in ilm_rollover_alias.

aseppala commented 4 years ago

+1

konstantin-921 commented 4 years ago

+1