Open jszwedko opened 3 years ago
I've found that the ILM configuration in logstash is pretty bad, IMHO - It might be better to focus on data streams support?
@spencergilbert we plan to do both, but we can focus on datastreams first. Do you have thoughts on what an improved version of index lifecycle management might look like? Or do you think it's not even worth it given datastreams? We've had some users ask for ILM support.
I think all in on datastreams is probably better, by my understanding it manages a lot of the ILM work the client would need to implement.
Logstash ILM was painful when I used it because you can't/couldn't supply the alias
as a template based on log fields.
There is a "problem" regarding this: the index name is a template so we cannot say, without a given event, what will be the indexes. This implies that we'll have to upsert the ILM and templates definition when we receive an event. In that case, each time we'll receive events, we'll have to do 3 calls: 1 to create the ILM, 1 to create the template and 1 to push the metrics. This would most probably kill the performances of the sink. We could think of having a cache at the vector level, which could become tricky when you're running several instances of vector in parallel and increase the memory usage. Now, if we take a look at an other sinks, Clickhouse, it needs a migration to work. Maybe, elasticsearch would need a migration to work as well.
After some discussion we've decided to punt on this for now given we have added datastreams support which seems to be the ordained path for getting observability data into Elasticsearch and handles index lifecycle management for you. We'll leave this open to collect additional use-cases for ILM though.
Hi, hopefully this isn't too OT - after reading the above, looks like we need to get into data streams. Can somebody explain how data streams "handle index lifecycle management for you"? According to the docs it's still necessary to set up ILM. What have I missed?
Hi, hopefully this isn't too OT - after reading the above, looks like we need to get into data streams. Can somebody explain how data streams "handle index lifecycle management for you"? According to the docs it's still necessary to set up ILM. What have I missed?
I think particularly data streams avoid the hassle described here: https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started-index-lifecycle-management.html#manage-time-series-data-without-data-streams
@jszwedko @spencergilbert - is this feature support worked upon
@jszwedko @spencergilbert - is this feature support worked upon
Not currently. We'd be happy to review a proposal for it though! Many users seem to have moved onto data streams for telemetry data.
@spencergilbert @jszwedko how can one decide on which data_stream name, index template, or ILM policy vector is going to use?
From my understanding I need to create the index template and ilm policy before hand. But I am not sure how to setup vector in a way that will work with my custom ilm and index template.
Support the same options that
logstash
does for managing index lifecycles: https://github.com/logstash-plugins/logstash-output-elasticsearch/blob/master/docs/index.asciidoc#index-lifecycle-management