vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.5k stars 1.53k forks source link

Add Elasticsearch Bulk API, HTTP Server Source #4600

Open rwaweber opened 3 years ago

rwaweber commented 3 years ago

Current Vector Version

Latest I suppose?

Use-cases

Being able to receive events from clients that already know how to write to elasticsearch's Bulk API, like elastic beats, logstash, or other tools/libraries.

An example use case, could be to run a vector ES Bulk API Server to receive events from a winlogbeat client which would use an elasticsearch output[2], as a means to potentially build a workaround for [1] (though admittedly a windows eventlog source for vector would likely be a better solution in the long run, though I'd wager is likely a lot more difficult to implement).

This would be extremely helpful for us, as we are doing something similar to this, but with a handful of logstash instances behind a loadbalancer with a corresponding beats inputs[3]. This doesn't perform particularly well because the custom beats protocol doesn't loadbalance particularly well[4] as part of it being a stateful TCP connection.

In theory, with an ES bulk API source, we'd be able to place a vector instance behind a loadbalancer and turn those same beats clients to this loadbalancer, and theoretically get better performance over HTTP(assumption being that HTTP is much easier to loadbalance than a custom TCP protocol).

I see this source looking very similar to the HEC source, though instead of being for splunk-clients this is more aimed at elasticsearch-client systems.

Attempted Solutions

Unfortunately I don't really have any attempted solutions at this time.

Proposal

I think I covered this in my use case, though I'm happy to revise/edit this to make it more readable!

References

[1] https://github.com/timberio/vector/issues/2719 [2] https://www.elastic.co/guide/en/beats/winlogbeat/current/elasticsearch-output.html [3] https://www.elastic.co/guide/en/logstash/current/plugins-inputs-beats.html [4] https://github.com/elastic/beats/issues/7824#issuecomment-409553329

nicolaipre commented 1 year ago

Hi.

What is the current status of this? I looked through the other referenced issues but there were no comments on them from the devs.

spencergilbert commented 1 year ago

Hi @nicolaipre,

Adding this source hasn't been prioritized at this time, but we're happy to help guide a community contribution for it!

gregoryjcoates commented 11 months ago

I found this during work today and wanted to add my two cents.

First, this, and issue #10170 are completely different requests. This is about acting as a input for beats, in the same way that you can send beats, or elastic agent output to logstash. 10170 is about an input from an Elasticsearch cluster, aka a database input.

As for a database input, it simply needs to wrap around the search and PIT APIs and pass in a string/json containing the search variables. That would allow for full search, and paginated result to be sorted, time limited, ect.

I would love to help or push for a proper Elasticsearch input and am going to look into it but I have zero Rust experience so it would take me a bit. The lack of an Elasticsearch input is currently the only thing keeping me from pushing to have Vector fully replace our Logstash usage, though I am working to implement it for our use cases such as Kafka to Elastic as the memory usage reduction from my testing is too good to pass up.

Elastic Admin for Swift Transportation/Knight-Swift

jszwedko commented 11 months ago

Thanks for the thoughts @gregoryjcoates !

You are right, I misunderstood https://github.com/vectordotdev/vector/issues/10170 as being the same as this issue, but that one is about extracting data from Elasticsearch where-as this issue is about having Vector expose an Elasticsearch-compatible API so that users could more easily, as in your example, point Elasticsearch compatible tools like Beats at Vector. I'll re-open #10170. I think we can continue the discussion of that source over there.